It was simple, but you know, it’s always simple when you’ve done it.
—Simone Gabbriellini14 Functional Programming in R
In this Chapter we aren’t going to cover any fundamentally new R powers. Instead we’ll get acquainted with just one aspect of a computer programming paradigm known as functional programming. We will examine a set of R-functions for which functions themselves are supplied as arguments. These functions allow us to accomplish a great deal of computation in rather concise and expressive code. Not only are they useful in R itself, but they help you to reason abstractly about computation and prepare you for functional-programming aspects of other programming languages.
14.1 Programming Paradigms
Let us begin by exploring the notion of a programming paradigm in general. We will go on in this Chapter to consider two programming paradigms for which R provides considerable support. In the next Chapter we will consider a third programming paradigm that exists in R.
A programming paradigm is a way to describe some of the features of programming languages. Often a paradigm includes principles concerning the use of these features, or embodies a view that these features have special importance and utility in good programming practice.
14.1.1 Procedural Programming
One of the older programming paradigms in existence is procedural programming. It is supported in many popular languages and is often the first paradigm within which beginners learn to program. In fact, if one’s programming does not progress beyond a rudimentary level, one may never become aware that one is working within the procedural paradigm—or any paradigm at all, for that matter.
Before we define procedural programming, let’s illustrate it with an example. Almost any of the programs we have written so far would do as examples; for specificity, let’s consider the following snippet of code that produces from the data frame m111survey
a new, smaller frame consisting of just the numerical variables:
library(bcscr)
# find the numer of columns in the data frame:
<- length(names(m111survey))
cols #set up a logical vector of length equal to the number of columns:
<- logical(cols)
is_numerical
# loop through. For each variable, say if it is numerical:
for (i in seq_along(is_numerical)) {
<- is.numeric(m111survey[, i])
is_numerical[i]
}
# pick the numerical variables from the data frame
<- m111survey[, is_numerical]
num_summ_111 # have a look at the result:
str(num_summ_111)
'data.frame': 71 obs. of 6 variables:
$ height : num 76 74 64 62 72 70.8 70 79 59 67 ...
$ ideal_ht : num 78 76 NA 65 72 NA 72 76 61 67 ...
$ sleep : num 9.5 7 9 7 8 10 4 6 7 7 ...
$ fastest : int 119 110 85 100 95 100 85 160 90 90 ...
$ GPA : num 3.56 2.5 3.8 3.5 3.2 3.1 3.68 2.7 2.8 NA ...
$ diff.ideal.act.: num 2 2 NA 3 0 NA 2 -3 2 0 ...
By now there is nothing mysterious about the above code-snippet. What we want to become conscious of is the approach we have taken to the problem of selecting the numerical variables. In particular, observe that:
- We worked throughout with data, some of which, like
m111survey
, was given to us and some of which we created on our own to help solve the problem. For example, we created the variablecols
. Note also the very helpful index-variablei
in thefor
-loop. We set up the data structureisNumerical
in order to hold a set of data (TRUE
s andFALSE
s). - We relied on various procedures to create data and to manipulate that data in order to produce the desired result. Some of the procedures appeared as special blocks of code—most notably the
for
-loop. Other procedures took the form of functions. As we know, a function encapsulates a useful procedure so that it can be easily reused in a wide variety of circumstances, without the user having to know the details of how it works. We know thatnames()
will give us the vector of names of the columns ofm111survey
, thatlength()
will tell us how many names, there are, thatis.numeric()
will tell us whether or not a given variable inm111survey
is a numerical variable, and so on. The procedures embodied in these functions were written by other folks and we could examine them if we had the time and interest, but for the most part we are content simply to know how to access them.
Procedural programming is a paradigm that solves problems with programs that can be broken up into collections of variables, data structures and procedures. In this paradigm, there is a sharp distinction between variables and data structures on the one hand and procedures on the other. Variables and data structures are data—they are the “stuff” that a program manipulates to produce other data, other “stuff.” Procedures do the manipulating, turning stuff into other stuff.
14.2 The Functional Programming Paradigm
Let us now turn to the second of the two major programming paradigms that we study in this Chapter: Functional Programming.
14.2.1 The Ubiquity of Functions in R
Let’s a bit more closely at our code snippet. Notice how prominently functions figure into it, on nearly every line. In fact, every line calls at least one function! This might seem unbelievable: after all, consider the line below:
<- m111survey[, is_numerical] num_summ_111
There don’t appear to be any functions being called, here! But in fact two functions get called:
- The so-called assignment operator
<-
is actually a function in disguise: the more official—albeit less readable—form ofvariable <- value
is:
`<-`(variable, value)
Thus, to assign the value 3 to that variable a
one could write:
`<-`(a, 3)
# check that a is really 3 a
[1] 3
- The sub-setting operator for vectors
[
, more formally known as extraction (seehelp(Extract)
) is also a function. The expressionm111survey[, isNumerical]
is actually the following function-call in disguise:
`[`(m111survey, isNumerical)
Indeed functions are ubiquitous in R. This is part of the significance of the following well-known remark by a developer of S, the precursor-language of R:
“To understand computations in R, two slogans are helpful:
- Everything that exists is an object.
- Everything that happens is a function call.”
—John Chambers
The second slogan indicates that functions are everywhere in R. It also corresponds to the first principle of the functional programming paradigm, namely:
Computation is regarded as the evaluation of functions.
14.2.2 Functions as First-Class Citizens
So functions are ubiquitous in R. Another interesting thing about them is that even though they seem to be associated with procedures—after all, they make things happen—they are, nevertheless, also objects. They are data, or “stuff” if you like.
This may not seem obvious at first. But look at the following code, where you can ask what type of thing a function is:
typeof(is.numeric)
[1] "builtin"
The so-called “primitive” functions of R—the functions written not in R but in C-code—are “built in” objects. On the other hand, consider this user-defined function:
<- function(x) x+3
f typeof(f)
[1] "closure"
Functions other than primitive functions are objects of type “closure.”1
If a function can be a certain type of thing, then it must be a “thing”—an object, something you can manipulate. For example, you can put functions in a list:
<- list(is.numeric, f)
lst lst
[[1]]
function (x) .Primitive("is.numeric")
[[2]]
function (x)
x + 3
Very importantly, you can make functions serve as argument for other functions, and functions can return other functions as their results. The following example demonstrates both of these possibilities.
<- function(f) {
cuber <- function(x) f(x)^3
g
g
}<- cuber(abs)
h h(-2) # returns |-2|^3 = 2^3 = 8
[1] 8
In fact, in R functions can be treated just like any variable. In computer programming, we say that such functions are first-class citizens.
Although it is not often stated as a separate principle of the functional programming paradigm it is true that in languages that provide support for functional programming, the following principle holds true:
Functions are first-class citizens.
14.2.3 Minimize Side Effects
In the code-snippet under consideration, we note that there are two types of functions:
- functions that return a value;
- functions that provide output to the console or make a change in the Global Environment.
Example of the first type of function included:
length()
names()
seq_along()
is.numeric()
- the extraction-function
`[`()
A function that produced output to the console was str()
.
The assignment function `<-`()
added cols
, isNumerical
and numsm111
to the Global Environment, and also made changes to isNumerical
in the course of the for
-loop.
Of course we have seen examples of functions that do two of these things at once, for example:
<- function(x) {
my_fun cat("my_fun is running!\n") # output to console
+ 3 # return a value
x
}my_fun(6)
my_fun is running!
[1] 9
In computer programming, output to the console, along with changes of state—changes to the Global Environment or to the file structure of your computer—are called side-effects. Functions that only return values and do not produce side-effects are called pure functions.
A third principle of the functional programming paradigm is:
Functions should be pure.
Now this principle is difficult to adhere to, and in fact if you were to adhere strictly to it in R then your programs would never “do” anything. There do exist quite practical programming languages in which all of the functions are pure—and this leads to some very interesting features such as that the order in which operations are evaluated doesn’t affect what the function returns—but these “purely functional” languages manage purity by having other objects besides functions produce the necessary side-effects. In R we happily let our functions have side-effects: we certainly want to do some assignment, and print things out to the console from time to time.
One way that R does support the third principle of functional programming is that it makes it easy to avoid having your functions modify the Global Environment. To see this consider the following example:
<- function(x) {
add_three <- 5
heavenly_hash +3 # returns this value
x
}<- add_three(10)
result result
[1] 13
Now ask for the value of heavenly_hash
:
heavenly_hash
Error: object 'heavenly_hash' not found
This is as we expect: the variable heavenly_ash
exists only in the run-time environment that is created in the call to add_three()
. As soon as the function finishes execution that environment dies, and heavenly_hash
dies long with it. In particular, it never becomes part of the Global Environment.
If you really want your functions to modify the Global Environment—or any environment other than its run-time environment, for that matter—then you have to take special measures. You could, for example, use the super-assignment operator <<-
:
<- function(x) {
add_three_side_effect <<- 5
heavenly_hash +3 # returns this value
x
}<- add_three_side_effect(10)
result result
[1] 13
heavenly_hash
[1] 5
The super-assignment operator looks for the name heavenly_hash
in the parent environment of the run-time environment, If it finds heavenly_hash
there then it changes its value to 5 and stops. Otherwise it looks in the next parent up, and so on until it reaches the Global Environment, at which point if it doesn’t find a heavenly_hash
it creates one and gives it the value. In the example above, assuming you ran the function from the console, the parent environment is the Global Environment and the function has made a change to it: a side-effect.
Except in the case of explicit assignment functions like `<-`()
, changes made by functions to the Global Environment can be quite problematic. After all, we are used to using functions without having to look inside them to see how they do their work. Even if we once wrote the function ourselves, we may not remember how it works, so if it creates side effects we may not remember that it does, and calling them could interfere with other important work that the program is doing. (If the program already has heavenly_hash
in the Global Environment and the we call a function that changes it value, we could be in for big trouble.) Accordingly, R supports the third principle of functional programming to the extent of making it easy for you to avoid function calls that change your Global Environment.
14.2.4 Procedures as Higher-Order Function Calls
The last principle of the functional programming paradigms that we will state here isn’t really a formal principle: it is really more an indication of the programming style that prevails in languages where functions are first-class objects and that provide other support for functional programming. The final principle is:
As much as possible, procedures should be accomplished by function calls, In particular, loops should be replaced by calls to higher-order functions.
A higher-order function is simply a function that takes other functions as arguments. R provides a nice set of higher-order functions, many of which substitute for iterative procedures such as loops. In subsequent sections we will study the some of the most important higher-order functions, and see how they allow us to express some fairly complex procedures in a concise and readable way. You will also see how this style really blurs the distinction—so fundamental to procedural programming—between data and procedures. In functional programming, functions ARE data, and procedures are just function calls.
14.2.5 Functional Programming: A Summary
For our purposes, the principles of the functional programming paradigm are as follows:
- Computation consists in the evaluation of functions.
- Functions are first-class citizens in the language.
- Functions should only return values; they should not produce side-effects. (At the very least they should not modify the Global Environment unless they are dedicated to assignment in the first place.)
- As much as possible, procedures should be written in terms of function calls. In particular, loops should be replaced by calls to higher-order functions.
14.3 purrr Higher-Order Functions for Iteration
In the remainder of the Chapter we will study important higher-order functions: functions that take a function as an argument and apply that function to each element of another data structure. As we have said previously, such functions often serve as alternatives to loops.
The higher-order functions we study come from the package purrr, which is attached whenever we load the tidy-verse.
14.3.1 map()
and Variations
Suppose that we want to generate five vectors, each of which consists of ten numbers randomly chosen between 0 and 1. We accomplish the task with a loop, as follows:
# set up a list of length 5:
<- vector(mode = "list", length = 5)
lst for (i in 1:5) {
<- runif(10)
lst[[i]]
}str(lst)
List of 5
$ : num [1:10] 0.988 0.46 0.973 0.474 0.762 ...
$ : num [1:10] 0.33 0.341 0.879 0.785 0.925 ...
$ : num [1:10] 0.951 0.631 0.112 0.327 0.455 ...
$ : num [1:10] 0.6967 0.0229 0.9949 0.2573 0.1891 ...
$ : num [1:10] 0.4634 0.6867 0.7202 0.8735 0.0791 ...
If we wanted the vectors to have length \(1, 4, 9, 16,\) and 25, then we could write:
<- vector(mode = "list", length = 5)
lst for (i in 1:5) {
<- runif(i^2)
lst[[i]]
}str(lst)
List of 5
$ : num 0.647
$ : num [1:4] 0.394 0.619 0.477 0.136
$ : num [1:9] 0.06738 0.12915 0.39312 0.00258 0.62021 ...
$ : num [1:16] 0.409 0.54 0.961 0.654 0.547 ...
$ : num [1:25] 0.96407 0.07147 0.95581 0.94798 0.00119 ...
In the first example, the elements in the vector 1:5
didn’t matter—we wanted a vector of length ten each time—and in the second case the elements in the 1:5
did matter, in that they determined the lengths of the five vectors produced. Of course in general we could apply runif()
to each element of any vector at all, like this:
<- c(5, 7, 8, 2, 9)
vec <- vector(mode = "list", length = length(vec))
lst for (i in seq_along(vec)) {
<- runif(vec[i])
lst[[i]]
}str(lst)
List of 5
$ : num [1:5] 0.647 0.394 0.619 0.477 0.136
$ : num [1:7] 0.06738 0.12915 0.39312 0.00258 0.62021 ...
$ : num [1:8] 0.826 0.423 0.409 0.54 0.961 ...
$ : num [1:2] 0.1968 0.0779
$ : num [1:9] 0.818 0.942 0.884 0.166 0.355 ...
If we can apply runif()
to each element of a vector, why not apply an arbitrary function to each element? That’s what the function map()
will do for us. The general form of map()
is:
map(.x, .f, ...)
In the template above:
.x
can be a list or any atomic vector;.f
is a function that is to be applied to each element of.x
. In the default operation ofmap()
, each element of.x
becomes in turn the first argument of.f
....
consists of other arguments that are supplied as arguments for the.f
function, in case you have to set other parameters of the function in order to get it to perform in the way you would like.
The result is always a list.
With map()
we can get the desired list as follows (try it):
If we had wanted the random numbers to be between—say—4 and 8, then we would supply extra arguments to runif()
as follows (try it):
The default behavior of map()
is that the .x
vector supplies the first argument of .f
. However, if some ...
parameters are supplied then .x
substitutes for the first parameter that is not mentioned in ...
. In the above example, the min
and max
parameters are the second and third parameters for runif()
so .x
substitutes for the first parameter—the one that determines how many random numbers will be generated. In the example below, the vector lower_bounds
substitutes for min
, the second parameter of runif()
(try it):
Sometimes we wish to vary two or more of the parameters of function. In that case we use pmap()
. The first parameter of pmap()
is named .l
and takes a list of vectors (or lists). For example:
Observe that pmap()
knows to interpret the first element of the input-list—the vector how_many
as giving the values of the first argument of runif()
. The second parameter of runif()
(min
) is set at 0, so pmap()
deduces that upper_bounds
—the second element of the input-list—gives the values for the next next parameter in line, the parameter max
.
One might just as well use pmap()
to vary all three parameters:
The .f
parameter can be any function, including one that you define yourself. Here’s an example:
You could also set f
to be a function that you write on the spot, without even bothering to give it a name:
In computer programming a function is called anonymous when it is not the value bound to some name. .
map()
allows a shortcut for defining anonymous functions. The above call could have been written as:
c(1, 3, 5) %>%
map(~ runif(3, min = 0, max = .))
The ~
indicates that the body of the function is about to be begin. The .
stands for the parameter of the function.
When we introduced map()
we said that .x
was a vector or a list, In fact .x
could be an object that can be coerced into a list. Hence it is quite common to use map()
with the data frames: the frame is turned into a list, each element of which is a column of the frame. Here is an example:
data("m111survey", package = "bcscr")
<-
number_na %>%
m111survey map(~ sum(is.na(.)))
str(number_na)
List of 12
$ height : int 0
$ ideal_ht : int 2
$ sleep : int 0
$ fastest : int 0
$ weight_feel : int 0
$ love_first : int 0
$ extra_life : int 0
$ seat : int 0
$ GPA : int 1
$ enough_Sleep : int 0
$ sex : int 0
$ diff.ideal.act.: int 2
Note that the elements of the returned list inherit the names of the input data frame. This holds for any named input:
When the result can take on a form more simple than a list, it is possible to use variants of map()
such as:
map_int()
map_dbl()
map_lgl()
map_chr()
Thus we could obtain a named integer vector of the number of NA
-values for each variable in m11survey
as follows:
Here are the types of each variable:
Here is a statement of whether or not each variable is a factor:
14.3.2 walk()
and Variations
walk()
is similar to map()
, but is used when we are interested in producing side-effects. It applies its .f
argument to each element of .x
is was given, but also returns the .x
in case we want to pipe it into some other function.
Here we use walk()
only for its side-effect: we re-write a familiar function to print a pattern to the Console without using a loop.
<- function(char = "*", n = 5) {
pattern <- c(1:n, (n-1):1)
line_length <- function(char, n) {
the_line cat(rep(char, times = n), "\n", sep = "")
}%>% walk(the_line, char = char)
line_length
}
pattern(char = "a", n = 7)
a
aa
aaa
aaaa
aaaaa
aaaaaa
aaaaaaa
aaaaaa
aaaaa
aaaa
aaa
aa
a
The next example illustrates the use of the return-value of walk()
. We would like to save plots of all numerical variables from the data frame m111survey
, and also print summaries of them to the Console.
First we create a directory to hold the plots:
if ( !dir.exists("plots") ) dir.create("plots")
Next, we get the numerical variables in m111survey
:
<-
numericals %>%
m111survey keep(is.numeric) # purrr::keep()
We used purrr::keep()
, which retains only the elements of its input .x
such that its second argument .p
( a function that returns a single TRUE
or FALSE
) returns TRUE
.
We will also need the names of the numerical variables:
<-
num_names %>%
numericals names()
We need a function to save the density plot of a single numerical variable:
<- function(var, varname) {
save_graph <-
p ggplot(data = NULL, aes(x = var)) +
geom_density(fill = "burlywood") +
labs(title = paste0(
"Density plot for ",
".")
varname,
)ggsave(
filename = paste0("plots/density_", varname, ".png"),
plot = p, device = "png"
) }
We also need a function to produce a summary of a single numerical variable:
<- function(x, varname) {
make_summary <- fivenum(x, na.rm = TRUE)
five list(
variable = varname,
min = five[1],
Q1 = five[2],
median = five[3],
Q3 = five[4],
max = five[5]
) }
Now we walk through the process. We will actually use the functionpwalk()
, which will take the following inputs:
.x
(a list with two elements: the data frame of numerical variables and the vector of the names of these variables), and.f
(the functionsaveGraph
, to make and save a density plot)
We also use pmap_dfr()
, which takes a list consisting of the data frame and variable-names and constructs a data frame row-by-row, with each row summarizing one of the variables.
list(numericals, num_names) %>%
pwalk(save_graph) %>% # returns the list
pmap_dfr(make_summary)
# A tibble: 6 x 6
variable min Q1 median Q3 max
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 height 51.0 65.0 68.000 71.75 79
2 ideal_ht 54.0 67.0 68.000 75.00 90
3 sleep 2.0 5.0 7.000 7.00 10
4 fastest 60.0 90.5 102.000 119.50 190
5 GPA 1.9 2.9 3.225 3.56 4
6 diff.ideal.act. -4.0 0.0 2.000 3.00 18
Check the plots
directory; it should contain these files:
density_diff.ideal.act.png
density_fastest.png
density.GPA.png
density_height.png
density_ideal_ht.png
density_sleep.png
14.3.3 Example: Flowery Meadow Redux
In Section 4.3.3.1 we simulated people walking through a meadow, picking flowers until they had picked a desired number of flowers of a desired color. In Section 9.3.3 we used lists to store the results of such a simulation. Now we’ll see how to store the results as a data frame.
First, we modify the helper-function that simulates one person picking flowers so that, instead of returning a vector of colors, it returns a data frame:
## colors in the filed:
<- c("blue", "red", "pink", "crimson", "orange")
flower_colors ## new helper-function:
<- function(person, color, wanted) {
walk_meadow_df <- TRUE
picking ## the following will be extended to hold the flowers picked:
<- character()
flowers_picked <- 0
desired_count while (picking) {
<- sample(flower_colors, size = 1)
picked <- c(flowers_picked, picked)
flowers_picked if (picked == color) desired_count <- desired_count + 1
if (desired_count == wanted) picking <- FALSE
}## return a data frame:
data.frame(
person = rep(person, times = length(flowers_picked)),
color = flowers_picked
) }
Note that the new function takes an extra parameter person
, the name of the person picking the flowers.
Let’s try it out:
walk_meadow_df("Scarecrow", "red", 1)
person color
1 Scarecrow blue
2 Scarecrow blue
3 Scarecrow crimson
4 Scarecrow crimson
5 Scarecrow crimson
6 Scarecrow crimson
7 Scarecrow red
Now we write the function to make the data frame of results for a group of people. pmap()
will come in handy.
<- function(people, favs, numbers) {
all_walk_df ## initialize a list of the required length:
list(people, favs, numbers) %>%
## run it through pmap() to get a list of data frames:
pmap(walk_meadow_df) %>%
## the following purrr function converts the list of df's
## into one data frame, binding:
list_rbind()
}
Let’s try it out:
<-
results all_walk_df(
people = c("Dorothy", "Toto"),
favs = c("blue", "orange"),
numbers = c(4, 2)
)
Here are the results:
14.3.4 Practice Exercises
14.4 Other purrr Higher-Order Functions
14.4.1 keep()
and discard()
keep()
is similar to dplyr’s filter()
, but whereas filter()
chooses rows of a data frame based on a given condition, keep()
chooses the elements of the input list or vector .x
based on a condition named .p
.
Examples:
discard(.x,, . p = condition)
is equivalent to keep(.x, .p = !condition)
. Thus:
14.4.2 reduce()
Another important member of the purrr family is reduce()
. Given a vector .x
and a function .f
that takes two inputs, reduce()
does the following:
- applies
f
to elements 1 and 2 of.x
, getting a result; - applies
f
to the result and to element 3 of.x
, getting another result; - applies
f
to this new result and to element 4 of.x
, getting yet another result … - … and so on until all of the elements of
.x
have been exhausted. - then
reduce()
returns the final result in the above series of operations.
For example, suppose that you want to add up the elements of the vector:
<- c(3, 1, 4, 6) vec
Of course you could just use:
sum(vec)
[1] 14
After all, sum()
has been written to apply to many elements at once. But what if addition could only be done two numbers at a time? How might you proceed? You could:
- add the 3 and 1 of (the first two elements of
vec
), getting 4; - then add 4 to 4, the third element of
vec
, getting 8; - then add 8 to 6, the final element of
vec
, getting 14; - then return 14.
reduce()
operates in this way.
%>%
vec reduce(.f = sum)
[1] 14
Can you see how reduce()
gets its name? Step by step, it “reduces” its .x
argument, which may consist of many elements, to a single value.
A common application of reduce()
is to take an operation that is defined on only two items and extend it to operate on any number of items. Consider, for example, the function intersect()
, , which will find the intersection of any two vectors of the same type:
<- c(3, 4, 5, 6)
vec1 <- c(4, 6, 8, -4)
vec2 intersect(vec1, vec2)
[1] 4 6
You cannot intersect three or more vectors at once:
intersect(vec1, vec2, c(4, 7, 9))
Error in base::intersect(x, y, ...): unused argument (c(4, 7, 9))
With reduce()
you can intersect as many vectors as you like, provided that they are first stored in a list.
You can write your own function to supply as the argument for .f
, but it has to be able to operate on two arguments. reduce()
will take the first argument of the .f
function to be what has been “accumulated” so far, and the second argument of the .f
function—the value to be combined with what has been accumulated—will be provided by the current element of .x
.
As a simple example, let’s write our own reduce-summer in a way that shows the user the reduction process at work:
## the .f function:
<- function(acc, curr) {
my_summer cat("So far I have ", acc, ",\n")
cat(
"But just now I was given " , curr,
" to add in.\n\n", sep = ""
)sum(acc, curr)
}
## .x will be the whole numbers from 1 to 4:
1:4 %>%
reduce(.f = my_summer)
So far I have 1 ,
But just now I was given 2 to add in.
So far I have 3 ,
But just now I was given 3 to add in.
So far I have 6 ,
But just now I was given 4 to add in.
[1] 10
When you write your own .f
function, it’s a good idea to use names for the parameters that remind you of their role in the reduction process. acc
(for “accumulated”) and curr
(for “current”) are used above.
reduce()
can take an argument called .init
. When this argument is given a value, operation begins by applying to .f
to .init
and the first element of .x
. For example:
1:4 %>%
reduce(.f = my_summer, .init = 100)
So far I have 100 ,
But just now I was given 1 to add in.
So far I have 101 ,
But just now I was given 2 to add in.
So far I have 103 ,
But just now I was given 3 to add in.
So far I have 106 ,
But just now I was given 4 to add in.
[1] 110
14.4.3 Practice Exercises
14.5 Functionals vs. Loops
The higher-order functions we have studied in this chapter are often called functionals. As we pointed out earlier, they deliver results that could have been produced by a writing a loop of some sort.
Once you get used to functionals, you will find that they are often more “expressive” than loops—easier for others to read and to understand, and less prone to bugs. Also, many of them are optimized by the developers of R to run a bit faster than an ordinary loop written in R.
For example, consider the following list. It consists of ten thousand vectors, each of which contains 100 randomly-generated numbers.
<- map(rep(100, 10000), runif) lst
If we want the mean of each vector, we could write a loop:
<- numeric(10000)
means for (i in 1:10000) {
<- mean(lst[[i]])
means[i] }
Or we could use map_dbl()
:
<- map_dbl(lst, mean) means
Comparing the two using system.time()
, on my machine I got:
system.time(means <- map_dbl(lst, mean))
## user system elapsed
## 1.557 0.073 1.630
For the loop, I get:
system.time({
<- numeric(10000)
means for (i in 1:10000) {
<- mean(lst[[i]])
means[i]
} })
## user system elapsed
## 1.653 0.075 1.730
The map-function is a bit faster, but the difference is small.
Remember also that vectorization is much faster than looping, and is also usually quite expressive, so don’t struggle to take a functional approach when vectorization is possible. (This advice applies to a several examples from this Chapter, in which the desired computations had already been accomplished in earlier chapters by some form of vectorization.)
14.6 Conclusion
In this Chapter we have concentrated on only a single aspect of the Functional Programming paradigm: exploiting the fact that functions are first-class citizens in R, we studied a number of higher-order functions that can substitute for loops. There is certainly a great deal more to Functional Programming than the mere avoidance of loops, but we’ll end our study at this point. Familiarity with higher-order functions will stand you in good stead when you begin, in subsequent courses on web programming, to learn the JavaScript language. JavaScript makes constant use of higher-order functions!
14.7 More in Depth
14.7.1 An Extended Example of Reduction
Let’s apply reduce()
with .init
to the task of making a truth table: the set of all \(2^n\) logical vectors of a given length \(n\).
The set \(S_1\) of vectors of length \(n = 1\) consists of only two vectors:
vec1 TRUE
vec2 FALSE
Now consider a systematic way to construct the set \(S_2\) of all the vectors of length two. We know that there are four such vectors:
vec1 TRUE TRUE
vec2 TRUE FALSE
vec3 FALSE TRUE
vec4 FALSE FALSE
Observe that the first two of them begin with TRUE
and end with the set \(S_1\) of vectors of length one:
vec1 TRUE TRUE
vec2 TRUE FALSE
The last two of them begin with FALSE
and also end with \(S_1\):
vec3 FALSE TRUE
vec4 FALSE FALSE
Now consider \(S_3\), the set of all eight vectors of length three:
vec1 TRUE TRUE TRUE
vec2 TRUE TRUE FALSE
vec3 TRUE FALSE TRUE
vec4 TRUE FALSE FALSE
vec5 FALSE TRUE TRUE
vec6 FALSE TRUE FALSE
vec7 FALSE FALSE TRUE
vec8 FALSE FALSE FALSE
Observe that the first four of them end begin with TRUE
and and with the vectors of \(S_2\):
vec1 TRUE TRUE TRUE
vec2 TRUE TRUE FALSE
vec3 TRUE FALSE TRUE
vec4 TRUE FALSE FALSE
The last four of them begin with FALSE
and also end with the vectors of \(S_2\):
vec5 FALSE TRUE TRUE
vec6 FALSE TRUE FALSE
vec7 FALSE FALSE TRUE
vec8 FALSE FALSE FALSE
The pattern is now clear. If for any \(m \ge 1\) you are in possession of the \(2^m \times m\) matrix \(S_m\) of all possible vectors of length \(m\), then to obtain the \(2^{m+1} \times (m+1)\) matrix \(S_{m+1}\) of all possible vectors of length \(m+1\) you should:
- stack \(2^m\)
TRUE
s on top of \(2^m\)FALSE
s, creating a \(2^{m+1} \times 1\) matrix \(U\); - stack the \(S_m\) underneath itself, creating a \(2^{m+1} \times m\) matrix \(V\);
- place \(U\) next to \(V\).
reduce()
with .init
set to \(S_1\) is appropriate for this iterative building process. Here is an implementation:
<- function(n, verbose = FALSE) {
make_table # make .init (S_1)
<- matrix(c(TRUE, FALSE), nrow = 2)
s1 rownames(s1) <- c("vec1", "vec2")
colnames(s1) <- c("")
# make .f
<- function(accum, value) {
build_next if (verbose) {
cat(
"On value ", value,
" with accumulated material:",
sep = ""
)print(accum)
}if (value == 1) return(accum)
<- nrow(accum)
r <- c(
u rep(TRUE, times = r),
rep(FALSE, times = r)
)<- rbind(accum, accum)
v <- cbind(u, v)
next_matrix colnames(next_matrix) <- rep("", times = value)
rownames(next_matrix) <- paste(
"vec", 1:(2^value), sep = ""
)if (verbose) {
cat(
"Finishing value", value,
", and I've built:",
sep = ""
)print(next_matrix)
cat("\n\n")
}
next_matrix
}
# build from .init to the final product S_n
reduce(.x = 1:n, .f = build_next, .init = s1)
}
We have included a verbose
option so we can watch the process as it unfolds.
Note also that the parameters for the .f
function are named:
acc
(what has been “accumulated” up to the current step), andvalue
(the value of.x
at the current step).
It’s conventional to give these or similar names to the parameters of the building-function.
Let’s try it out:
make_table(3, verbose = TRUE)
On value 1 with accumulated material:
vec1 TRUE
vec2 FALSE
On value 2 with accumulated material:
vec1 TRUE
vec2 FALSE
Finishing value2, and I've built:
vec1 TRUE TRUE
vec2 TRUE FALSE
vec3 FALSE TRUE
vec4 FALSE FALSE
On value 3 with accumulated material:
vec1 TRUE TRUE
vec2 TRUE FALSE
vec3 FALSE TRUE
vec4 FALSE FALSE
Finishing value3, and I've built:
vec1 TRUE TRUE TRUE
vec2 TRUE TRUE FALSE
vec3 TRUE FALSE TRUE
vec4 TRUE FALSE FALSE
vec5 FALSE TRUE TRUE
vec6 FALSE TRUE FALSE
vec7 FALSE FALSE TRUE
vec8 FALSE FALSE FALSE
vec1 TRUE TRUE TRUE
vec2 TRUE TRUE FALSE
vec3 TRUE FALSE TRUE
vec4 TRUE FALSE FALSE
vec5 FALSE TRUE TRUE
vec6 FALSE TRUE FALSE
vec7 FALSE FALSE TRUE
vec8 FALSE FALSE FALSE
Of course in practice we would not turn on the verbose
option:
make_table(4)
vec1 TRUE TRUE TRUE TRUE
vec2 TRUE TRUE TRUE FALSE
vec3 TRUE TRUE FALSE TRUE
vec4 TRUE TRUE FALSE FALSE
vec5 TRUE FALSE TRUE TRUE
vec6 TRUE FALSE TRUE FALSE
vec7 TRUE FALSE FALSE TRUE
vec8 TRUE FALSE FALSE FALSE
vec9 FALSE TRUE TRUE TRUE
vec10 FALSE TRUE TRUE FALSE
vec11 FALSE TRUE FALSE TRUE
vec12 FALSE TRUE FALSE FALSE
vec13 FALSE FALSE TRUE TRUE
vec14 FALSE FALSE TRUE FALSE
vec15 FALSE FALSE FALSE TRUE
vec16 FALSE FALSE FALSE FALSE
Glossary
- Programming Paradigm
-
A programming paradigm is a way to describe some of the features of programming languages. Often a paradigm includes principles concerning the use of these features, or embodies a view that these features have special importance and utility in good programming practice.
- Procedural Programming
-
A programming paradigm that solves problems with programs that can be broken up into collections of variables, data structures and procedures. This paradigm tends to draw a sharp distinction between variables and data structures on the one hand and procedures on the other.
- Functional Programming
-
A programming paradigm that stresses the central role of functions. Some of its basic principles are:
- Computation consists in the evaluation of functions.
- Functions are first-class citizens in the language.
- Functions should only return values; they should not produce side-effects.
- As much as possible, procedures should be written in terms of function calls.
- Pure Function
-
A function that does not produce side-effects.
- Side Effect
-
A change in the state of the program (i.e., a change in the Global Environment) or any interaction external to the program (i.e., printing to the console).
- Higher-Order Function
-
A function that takes another function as an argument.
- Anonymous Function
-
A function that does not have a name.
- Refactoring
-
The act of rewriting computer code so that it performs the same task as before, but in a different way. (This is usually done to make the code more human-readable or to make it perform the task more quickly.)
Links to Class Slides
Quarto Presentations that I sometimes use in class:
Exercises
Exercise 1
Explain in words what the following line of code produces when given a numerical vector y
:
map(y, function(x) x^3 + 1)
In the course of your explanation, say whether the result is a vector or a list.
Exercise 2
Which do you think works faster for a given numerical vector y
? This code:
map(y, function(x) sqrt(x))
Or this code?
sqrt(y)
Justify your answer with a convincing example, using system.time()
. What moral do you draw from this?
Exercise 3
To refactor computer code is to rewrite the code so that it does the same thing, but in a different way. We might refactor code in order to make it more readable by humans, or to make it perform its task more quickly.
Refactor the following code so that it uses keep()
instead of a loop:
<- bcscr::m111survey
df <- logical(length(names(df)))
keep_variable for (col in seq_along(keep_variable)) {
<- df[, col]
var <- is.numeric(var)
is_numeric <- !any(is.na(var))
all_there <- is_numeric && all_there
keep_variable[col]
}<- df[, keep_variable]
new_frame head(new_frame)
Exercise 4
The following function produces a list of vectors of uniform random numbers, where the lower and upper bounds of the numbers are given by the arguments to the parameters lower
and upper
respectively, and the number of vectors in the list and the number of random numbers in each vector are given by a vector supplied to the parameter vecs
.
<- function(vecs, lower = 0, upper= 1, seed = NULL) {
random_sims # set seed if none is provided by the user
if (!is.null(seed)) {
set.seed(seed)
}
<- vector(mode = "list", length = length(vecs))
lst for (i in seq_along(vecs)) {
<- runif(vecs[i], min = lower, max = upper)
lst[[i]]
}
lst }
Refactor the code for random_sims()
so that it uses map()
instead of a loop.
Exercise 5
The following enhanced version of randomSims()
is even more flexible, as it allows both the upper and lower limits for the randomly-generated numbers to vary with each vector of numbers that is produced.
<- function(vecs, lower, upper, seed = NULL) {
random_sims2 # validate input
if (!(length(vecs) == length(upper) && length(upper) == length(lower)) ) {
return(
cat("All vectors entered must have the same length.")
)
}if (any(upper < lower)) {
return(
cat(paste0(
"Every upper bound must be at least as ",
"big as the corresponding lower bound."
)
)
)
}# set seed if none is provided by the user
if (!is.null(seed)) {
set.seed(seed)
}
<- vector(mode = "list", length = length(vecs))
lst for (i in seq_along(vecs)) {
<- runif(
lst[[i]] min = lower[i], max = upper[i]
vecs[i],
)
}
lst }
Use pmap()
to refactor the code for random_sims2()
so as to avoid using the loop.
Exercise 6
Supposing that y
is a numerical vector, explain in words what the following code produces:
%>% keep(function(x) x >= 4) y
Exercise 7
Write a line of code using the sub-setting operator [
that produces the same result as the code in the previous problem.
Exercise 8
Use keep()
to write a function called odd_members()
that, given any numerical vector, returns a vector containing the odd numbers of the given vector. Your function should take a single argument called vec
, the given vector. A typical example of use would be as follows:
odd_members(vec = 1:10)
[1] 1 3 5 7 9
Exercise 9
You are given the following list of character vectors:
<- list(
lst c("Akash", "Bipan", "Chandra", "Devadatta", "Raj"),
c("Raj", "Vikram", "Sita", "Akash", "Chandra"),
c("Akash", "Raj", "Chandra", "Bipan", "Lila"),
c("Akash", "Vikram", "Devadatta", "Raj", "Lila")
)
Use reduce()
and the union()
function to obtain a character vector that is the union of all the vectors in lst
.
Exercise 10
Remember the function subStrings()
from the exercises of the Chapter on Strings? Refactor it so that it does EXACTLY the same thing but makes no use of loops.
Exercise 11
Solve Part One of Advent of Code 2022 Day 3. Save your input file in your submit folder with the filename input_aoc_2022-03.txt
, and read in the input data, naming it input
, using the following code:
<- readLines("input_aoc_2022-03.txt") input
Exercise 12
Solve Part One of Advent of Code 2022 Day 25. Save your input file in your submit folder with the filename input_aoc_2022-25.txt
, and read in the input data, naming it input
, using the following code:
<- readLines("input_aoc_2022-25.txt") input
Hint: The snafu numbering-system bears some relationship to base-5 numbering. After reviewing Section 11.4, write two helper-functions: one to convert numbers to “base-snafu” and another to convert base-snafu representations to numbers.
The term “closure” comes from the fact that every function we define in R consists of three elements: its formal arguments, its body, and a pointer to its enclosing environment. (Recall that in R the enclosing environment is the environment that was active at the time the function was defined, and that it is the second place—after the run-time environment—where R consult when looking up names during the execution of the function.) Due to the importance of its enclosing environment a function gets the name “closure.”↩︎