3.1 Motivation for Functions

Suppose you have the job of printing out the word “Kansas” to the console four times, each time on a new line. The code for this is easy enough:

cat("Kansas\n")

## Kansas

cat("Kansas\n")

## Kansas

cat("Kansas\n")

## Kansas

cat("Kansas\n")

## Kansas

Now suppose that you have the job of printing out any given word to the console four times. You could of course, simply copy and paste the above code to a new place in your R script and then change “Kansas” to whatever the desired word is. But that’s an awful lot of work.

You could cut down on the work a bit if you use a variable:

word <- "Kansas"
cat(word, "\n", sep = "")
cat(word, "\n", sep = "")
cat(word, "\n", sep = "")
cat(word, "\n", sep = "")

The advantage of this approach is that, after you copy and paste you only have to make one change, i.e.: substitute the desired word in place of “Kansas” in the assignment to the variable word.

If you were writing a program that involved many four-line print-outs of various words, then you could carry on this way quite a while, producing many similar five-line snippets of printing-code throughout your program.

But suppose that it occurs to you one day: maybe you don’t really need five lines of code. What if cat() supports “vector in, vector out?” If so then we could take advantage of vectorization to obviate the need to repeated calls to `cat().

We could try:

fourWords <- rep("Kansas", 4)
cat(fourWords, "\n")

## Kansas Kansas Kansas Kansas

That just repeats “Kansas” four times, with the default space between in each one—then newline is appended. So we need a newline along with each instance of Kansas.

So instead we try:

fourWords <- rep("Kansas\n", 4)
cat(fourWords)

## Kansas
##  Kansas
##  Kansas
##  Kansas

Not quite what we wanted: cat() inserts the default space at the end of each instance of Kansas\n, resulting in the indentation of lines, 2, 3 and 4.

No problem—let’s just set the separation to the empty string "":

fourWords <- rep("Kansas\n", 4)
cat(fourWords, sep = "")

## Kansas
## Kansas
## Kansas
## Kansas

Success at last!

If you wanted to implement this new idea throughout your program, you would have to search through the program for the many five-line snippets you created previously, replacing each one of them with the appropriate version of your clever one-liner. Not only is this a lot of work, it’s also quite error-prone: you could miss or more of the snippets along the way, or on some occasion fail to modify the word within the one-liner to have the value you need at that point.

Accordingly programmers try, as much as possible, to solve problems in a general way and to implement that general solution in one place in their program. Then they call upon that solution in the many different locations where the solution might be required.

Functions are one way in which programmers accomplish this. The following is a function that will print any given word four times, once on each line:

catFourTimes <- function(word) {
  wordWithNewline <- paste(word, "\n", sep = "")
  cat(rep(wordWithNewline, 4), sep = "")
}

Let’s see the function in use:

catFourTimes("Kansas")

## Kansas
## Kansas
## Kansas
## Kansas

It works like a charm! What’s more, once we get to thinking in terms of general solutions, we realize that we might just as well have our function print not only any given word, but print it any given number of times. So instead of catFourTimes() we might actually use the following:

manyCat <- function(word, n) {
  wordWithNewline <- paste(word, "\n", sep = "")
  lines <- rep(wordWithNewline, times = n)
  cat(lines, sep = "")
}

Does it work? Let’s see:

manyCat("Kansas", 5)

## Kansas
## Kansas
## Kansas
## Kansas
## Kansas

Yes indeed!

Let’s consider the advantages of writing functions:

Functions allow us to re-use code, rather than repeating the code throughout our program.
The more generally the functions solves the problem, the varied are the situations in which the function may be re-used.
If we have to change our our approach to the problem—because our original solution was flawed or if there is a need to add new features to our solution, or for any other reason—then we only have to implement the necessary change in the definition of our function, rather than in the many places in the program where the function is actually used.

There is a well-known principle in computer programming called DRY, which is an acronym for “Don’t Repeat Yourself.” Computer code is said to be DRY when general solutions are defined in one place but are usable in many places, and when information needed in many places is defined authoritatively in one place. As a rule, DRY code is easy to develop, debug, read and maintain. The more you get into the habit of expressing solutions to problems in terms of functions, the “drier” your code will be.