14.2 The Functional Programming Paradigm

Let us now turn to the second of the two major programming paradigms that we study in this Chapter: Functional Programming.

14.2.1 The Ubiquity of Functions in R

Let’s a bit more closely at our code snippet. Notice how prominently functions figure into it, on nearly every line. In fact, every line calls at least one function! This might seem unbelievable: after all, consider the line below:

numsm111 <- m111survey[, isNumerical]

There don’t appear to be any functions being called, here! But in fact two functions get called:

  1. The so-called assignment operator <- is actually a function in disguise: the more official—albeit less readable—form of variable <- value is:

    `<-`(variable, value)

    Thus, to assign the value 3 to that variable a one could write:

    `<-`(a, 3)
    a   # check that a is really 3
    ## [1] 3
  2. The sub-setting operator for vectors [, more formally known as extraction (see help(Extract)) is also a function. The expression m111survey[, isNumerical] is actually the following function-call in disguise:

    `[`(m111survey, isNumerical)

Indeed functions are ubiquitous in R. This is part of the significance of the following well-known remark by a developer of S, the precursor-language of R:

“To understand computations in R, two slogans are helpful:

  • Everything that exists is an object.
  • Everything that happens is a function call."

—John Chambers

The second slogan indicates that functions are everywhere in R. It also corresponds to the first principle of the functional programming paradigm, namely:

Computation is regarded as the evaluation of functions.

14.2.2 Functions as First-Class Citizens

So functions are ubiquitous in R. Another interesting thing about them is that even though they seem to be associated with procedures—after all, they make things happen—they are, nevertheless, also objects. They are data, or “stuff” if you like.

This may not seem obvious at first. But look at the following code, where you can ask what type of thing a function is:

typeof(is.numeric)
## [1] "builtin"

The so-called “primitive” functions of R—the functions written not in R but in C-code—are “built in” objects. On the other hand, consider this user-defined function:

f <- function(x) x+3
typeof(f)
## [1] "closure"

Functions other than primitive functions are objects of type “closure.”35

If a function can be a certain type of thing, then it must be a “thing”—an object, something you can manipulate. For example, you can put functions in a list:

lst <- list(is.numeric, f)
lst
## [[1]]
## function (x)  .Primitive("is.numeric")
## 
## [[2]]
## function(x) x+3

Very importantly, you can make functions serve as argument for other functions, and functions can return other functions as their results. The following example demonstrates both of these possibilities.

cuber <- function(f) {
  g <- function(x) f(x)^3
  g
}
h <- cuber(abs)
h(-2)  # returns |-2|^3 = 2^3 = 8
## [1] 8

In fact, in R functions can be treated just like any variable. In computer programming, we say that such functions are first-class citizens.

Although it is not often stated as a separate principle of the functional programming paradigm it is true that in languages that provide support for functional programming, the following principle holds true:

Functions are first-class citizens.

14.2.3 Minimize Side Effects

In the code-snippet under consideration, we note that there are two types of functions:

  • functions that return a value;
  • functions that provide output to the console or make a change in the Global Environment.

Example of the first type of function included:

  • length()
  • names()
  • seq_along()
  • is.numeric()
  • the extraction-function `[`()

A function that produced output to the console was str().

The assignment function `<-`() added cols, isNumerical and numsm111 to the Global Environment, and also made changes to isNumerical in the course of the for-loop.

Of course we have seen examples of functions that do two of these things at once, for example:

myFun <- function(x) {
  cat("myFun is running!\n")  # output to console
  x + 3                       # return a value
}
myFun(6)
## myFun is running!
## [1] 9

In computer programming, output to the console, along with changes of state—changes to the Global Environment or to the file structure of your computer—are called side-effects. Functions that only return values and do not produce side-effects are called pure functions.

A third principle of the functional programming paradigm is:

Functions should be pure.

Now this principle is difficult to adhere to, and in fact if you were to adhere strictly to it in R then your programs would never “do” anything. There do exist quite practical programming languages in which all of the functions are pure—and this leads to some very interesting features such as that the order in which operations are evaluated doesn’t affect what the function returns—but these “purely functional” languages manage purity by having other objects besides functions produce the necessary side-effects. In R we happily let our functions have side-effects: we certainly want to do some assignment, and print things out to the console from time to time.

One way that R does support the third principle of functional programming is that it makes it easy to avoid having your functions modify the Global Environment. To see this consider the following example:

addThree <- function(x) {
  heavenlyHash <- 5
  x+3  # returns this value
}
result <- addThree(10)
result
heavenlyHash
## [1] 13
## Error: object 'heavenlyHash' not found

This is as we expect: the variable heavenlyHash exists only in the run-time environment that is created in the call to addThree(). As soon as the function finishes execution that environment dies, and heavenlyHash dies long with it. In particular, it never becomes part of the Global Environment.

If you really want you functions to modify the Global Environment—or any environment other than its run-time environment, for that matter—then you have to take special measures. You could, for example, use the super-assignment operator <<-:

addThreeSideEffect <- function(x) {
  heavenlyHash <<- 5
  x+3  # returns this value
}
result <- addThreeSideEffect(10)
result
## [1] 13
heavenlyHash
## [1] 5

The super-assignment operator looks for the name heavenlyHash in the parent environment of the run-time environment, If if finds heavenlyHash there then it changes its value to 5 and stops. Otherwise it looks in the next parent up, and so on until it reaches the Global Environment, at which point if it doesn’t find a heavenlyHash it creates one and gives it the value. In the example above, assuming you ran the function from the console, the parent environment is the Global Environment and the function has made a change to it: a side-effect.

Except in the case of explicit assignment functions like `<-`(), changes made by functions to the Global Environment can be quite problematic. After all, we are used to using functions without having to look inside them to see how they do their work. Even if we once wrote the function ourselves, we may not remember how it works, so if it creates side effects we may not remember that it does, and calling them could interfere with other important work that the program is doing. (If the program already has heavenlyHash in the Global Environment and the we call a function that changes it value, we could be in for big trouble.) Accordingly, R supports the third principle of functional programming to the extent of making it easy for you to avoid function calls that change your Global Environment.

14.2.4 Procedures as Higher-Order Function Calls

The last principle of the functional programming paradigms that we will state here isn’t really a formal principle: it is really more an indication of the programming style that prevails in languages where functions are first-class objects and that provide other support for functional programming. The final principle is:

As much as possible, procedures should be accomplished by function calls, In particular, loops should be replaced by calls to higher-order functions.

A higher-order function is simply a function that takes other functions as arguments. R provides a nice set of higher-order functions, many of which substitute for iterative procedures such as loops. In subsequent sections we will study the some of the most important higher-order functions, and see how they allow us to express some fairly complex procedures in a concise and readable way. You will also see how this style really blurs the distinction—so fundamental to procedural programming—between data and procedures. In functional programming, functions ARE data, and procedures are just function calls.

14.2.5 Functional Programming: A Summary

For our purposes, the principles of the functional programming paradigm are as follows:

  • Computation consists in the evaluation of functions.
  • Functions are first-class citizens in the language.
  • Functions should only return values; they should not produce side-effects. (At the very least they should not modify the Global Environment unless they are dedicated to assignment in the first place.)
  • As much as possible, procedures should be written in terms of function calls. In particular, loops should be replaced by calls to higher-order functions.

  1. The term “closure” comes from the fact that every function we define in R consists of three elements: its formal arguments, its body, and a pointer to its enclosing environment. (Recall that in R the enclosing environment is the environment that was active at the time the function was defined, and that it is the second place—after the run-time environment—where R consult when looking up names during the execution of the function.) Due to the importance of its enclosing environment a function gets the name “closure.”↩︎