3.5 Environments and Scope

3.5.1 Environments and Searching

In R an environment is a particular kind of data structure that helps the computer connect names to a value. An environment can be thought of as a bag of names—names for vectors, functions, and all sorts of objects—along with a way (provided automatically by the computer) of getting from each name to the value that it represents. The process that the computer follows in order to connect a name to a value is called scoping. In R, as in many other computer languages, environments are what makes scoping possible. In other words, environments are how R figures out what the names in any piece of code mean.

R has a considerable number of environments—and environments can be created and destroyed throughout the course of an R session—but at any moment only one of them is active. The active environment is the first environment that R will examine when it needs to look up a name in an expression.

The most familiar environment is the Global Environment—the one that is active when you are using R from the console. The names in the Global Environment, along with descriptions of the objects to which they refer, are shown in the Environment panel in the R Studio IDE. Alternatively, you see the names in the active environment by using the ls() function:

ls()

Even better is ls.str(), which will give the name of each object along with a summary of what sort of object it is.

ls.str()

You can remove all of the names from your Global Environment by pressing the Broom icon in the IDE, or by using the rm()function:

rm(list = ls())

As we mentioned previously, the Global Environment is only one of many environments that exist in R. The search() function will show you a number of other environments:

search()
##  [1] ".GlobalEnv"             "package:ggridges"       "package:ggformula"     
##  [4] "package:ggstance"       "package:Matrix"         "package:lattice"       
##  [7] "package:R6"             "package:tigerData"      "package:forcats"       
## [10] "package:stringr"        "package:dplyr"          "package:purrr"         
## [13] "package:readr"          "package:tidyr"          "package:tibble"        
## [16] "package:tidyverse"      "package:babynames"      "package:bindrcpp"      
## [19] "package:magrittr"       "package:bcscr"          "package:TurtleGraphics"
## [22] "package:grid"           "package:ggplot2"        "package:mosaicData"    
## [25] "tools:rstudio"          "package:stats"          "package:graphics"      
## [28] "package:grDevices"      "package:utils"          "package:datasets"      
## [31] "package:methods"        "Autoloads"              "package:base"

search() returns a character vector of names of environments. The first element is the Global Environment itself. The second element is an environment that is associated with last package that R loaded, the third is an environment associated with the next-to-last package, and so on. Each item on the list is considered to be the parent of the environment that came before it. Thus, the Global Environment has a parent environment, a grandparent environment, and so on. The complete sequence of environments is called the search path.

Just as the Global Environment has names for objects, so also the packages have names available for use. When you write code that contains a name, R will search for that name: first in your Global Environment, then in its parent environment—the environment of the last package loaded—and so on until it reaches the final package on the list: package base. If it can’t find the name anywhere, then it will throw an error, telling you that the object “cannot be found.”

Let’s try this with a few examples. First, define a (hopefully) new variable:

quadlingColor <- "red"

Then use it in some code:

cat(quadlingColor, ", white and blue\n", sep ="")
## red, white and blue

R was able to complete your request because:

  • it found the name `quadlingColor on its search path;
  • it found the name cat on its search path (and found that it referred to the cat() function)

You can tell where R found these things:

find("quadlingColor")
## [1] ".GlobalEnv"
find("cat")
## [1] "package:base"

R found quadlingColor in the first place it looked, whereas it had to go all the way up to package base to find an object with the name cat that looked like it was the name of a function.

What happens if the same name gets used in two different environments? Let’s investigate. First get a print of cat():

cat
## function (..., file = "", sep = " ", fill = FALSE, labels = NULL, 
##     append = FALSE) 
## {
##     if (is.character(file)) 
##         if (file == "") 
##             file <- stdout()
##         else if (startsWith(file, "|")) {
##             file <- pipe(substring(file, 2L), "w")
##             on.exit(close(file))
##         }
##         else {
##             file <- file(file, ifelse(append, "a", "w"))
##             on.exit(close(file))
##         }
##     .Internal(cat(list(...), file, sep, fill, labels, append))
## }
## <bytecode: 0x7fc392034a70>
## <environment: namespace:base>

I got the definition of the cat() function, all the way up in package base.

Now try:

rep(cat, times = 3)
## Error in rep(cat, times = 3) : 
##   attempt to replicate an object of type 'closure'

I got an error! That’s because the only reference R could find for cat was to the function cat() in package base, and since a function isn’t a vector you can’t repeat it.10

Next, define a variable named cat:

cat <- "Pippin"

At this point, we the identifier cat appears in at least two environments:

  • in the Global Environment, where is refers to the string “Pippin”;
  • in the environment associated with package base, where is refers to the cat()-function.

We can verify the above assertions with find():

find("cat")
## [1] ".GlobalEnv"   "package:base"

Now try:

rep(cat, times = 3)
## [1] "Pippin" "Pippin" "Pippin"

This time it worked! The reason is that R found a character-vector named cat in the Global Environment.

Now try:

cat(cat, "is a cat\n")
## Pippin is a cat

Wait a minute: why did this work? Doesn’t the Global Environment come before package base in the search path? Yes it does, but since the first occurrence of cat was followed by an open parenthesis R know to expect that it referred to a function. Hence it kept looking along the search path for a function with the name cat, eventually finding our familiar cat() function in base.

Well then, consider happens if we do this:

cat <- function(...) {
  "Meow!"
}

We have defined a function cat() that returns “Meow!” no matter what it is given as input.11

Now try again:

cat(cat, "is a cat\n")
## [1] "Meow!"

Since the cat() we defined is a function in the Global Environment—which comes before base in the search path—R uses our cat() instead of the base’s cat(). R programmers say that the base version of cat has been masked.

If I want to keep my cat() and still use the base version ofcat() as well, I can do that. In order to be sure of getting a particular package’s version of a function, put the name of the package and then two semicolons before the function-name, like this:

base::cat("This is the good ol' cat() we have been missing!")
## This is the good ol' cat() we have been missing!

But we don’t like our cat() so very much: let’s remove it:

rm(cat)

The vector cat is removed as well by the previous command.

3.5.2 A Note on Parameters vs. Arguments

It’s important to keep in mind the distinction between a parameter of a function on the one hand and, on the other hand the argument that gets supplied to that parameter.

When you are just starting out in R programming, this distinction can be difficult to remember, especially when the parameter and the argument have the same name. Now that we understand environments, though, we can get a grip on this tricky situation.

Let’s proceed by way of an example.

First of all, clear out your Global Environment:

rm(list = ls())

Next make a simple function that adds three to any given number. Our function will take one parameter n (the number to which 3 is to be added), and the default value of n shall be 4.

addThree <- function(n = 4) {
  n + 3
}

Next, bind the name n to the value 2:

n <- 2

You should now have two items in your Global Environment. Confirm this:

ls.str()
## addThree : function (n = 4)  
## n :  num 2

Now call the function as follows:

addThree(n = 5)
## [1] 8

Let’s recall how this works:

  • R sees that you want to assign the value 5 to the parameter n.
  • R executes the code in the body of the function. All is well.

Now call the function as follows:

addThree()
## [1] 7

Let’s recall how this works:

  • R sees that you did not assign anything to the parameter n.
  • “That’s OK,” says R. “I’ll use the default value of 4 for n.”
  • R executes the code in the body of the function. All is well.

Next, call the function as follows:

addThree(n = n)
## [1] 5

Let’s think about how this works:

  • R sees that you want to assign something to the parameter n. Apparently it is the value of a name n in some environment.
  • “Fine,” says R. “I’ll look up the value of this n thingie, if ever I have to use it in a computation.”
  • R executes the one line of code in the body of the function.
  • “Well, I’ll be darned,” says R, “I do need the value of this n thingie after all. I’ll look it up.”
  • R looks for n, finding it in the Global Environment. Apparently it’s bound to 2.
  • R computes \(2+3\) and returns \(5\). All is well.

Now call the function as follows:

addThree(n)
## [1] 5

Again let’s consider how this works:

  • R sees the n. Since the function has only one parameter, R figures that you mean to assign the value of n (in some environment) to its parameter n.
  • Everything now proceeds just as before, with 5 being the number returned.

Now let’s remove n from the Global Environment:

rm(n)

Now call the function again, in the following way:

addThree(n = n)
Error in addThree(n = n) : object 'n' not found

Can you see why we got an error? This time when R goes looking for n, it can’t find it: n is no longer in the Global Environment, nor is it anywhere else along the search path. Accordingly R throws the error.

The call addThree(n) will elicit the same error message, for the same reason.

The moral of the story is that parameters really, really are NOT the same thing as arguments, even when a parameter and an argument happen to be called by the same name.

3.5.3 Function Environments

Let’s summarize what we have learned so far:

  • An environment is a collection of names associated with objects.
  • The Global Environment is the environment that is active when we are working from the console.
  • When R needs to look up a name, it consults a search path.
  • When we are in the Global Environment the search path starts there, and continues to:
    • the last package loaded (the parent environment),
    • the package before that (the “grandparent environment”),
    • and so on …
    • … up to package base.
  • the first object of the right type having the given name that is found along the search path is the object to which R will associate the name.

Just as the Global Environment is a child of the last package loaded, so the Global Environment can have children of its own. In fact a child-environment is created whenever we define a function in the Global Environment and then run it.

Consider the following code:

a <- 10
b <- 4
f <- function(x, y) {
  a <- 5
  print(ls())
  cat("a is ", a, "\n",
      "b is ", b, "\n",
      "x is ", x, "\n",
      "y is ", y, "\n", sep = "")
}

Note that a and b are now in the Global Environment, where the value of a is 10 and the value of b is 5.

We have defined the function f(); pretty soon we will call it. The moment we do so, we will no longer be working directly from the console: instead R will hand control over to the function that it can execute the code in its body. This means that the Global Environment will no longer be the active environment. Instead the active environment will be one that is created at the moment when f is called. Accordingly, it is called the run-time environment (also known as the evaluation environment) of f.

Let’s go ahead and call f():

f(x = 2, y = 3)
## [1] "a" "x" "y"
## a is 5
## b is 4
## x is 2
## y is 3

In the body of the function ls() prints out all of the names in the active environment—which at the moment is the run-time environment of f(). This environment contains a with a value of 5—the a with a value of 10 is masked from it—along with the x and y that were passed into the function as arguments. The a variable having the value 5 that was created within the body of the function is said to be local to the function. Thus we can say that the run-time environment of a function consists of the variables that are local to the function and the arguments that were passed into it.

Observe that b is not a name in the function’s run-time environment: instead it is in the Global Environment. Nevertheless R can “find” b from within the function because the R considers the Global Environment—the environment in which f() was defined—to be the parent of the run-time environment12, and so the Global Environment is the second place R will look when searching for an object named b. Computer scientists say that b is within the scope of the function.

What happens to the run-time environment when f() finishes executing code? R simply destroys it. It’s as if the a, x and y came to life “inside of” f() but died as soon as f() stopped working.

The next time f() is called, a new run-time environment will be created to enable the code in the body of f() to do its work.

One consequence of the ephemeral nature of run-time environments is that they are not accessible from parent environments. Thus if the active environment is the Global Environment and you run across a reference to a, you will never “find” the a “inside of” f() or “inside of” any other function, for that matter. R looks only in the active environment and in ancestor-environments, never in child-environments, and besides the run-time environment no longer exists after a function has been called.

Let’s make sure of this with an example.

a <- 5
f <- function() {
  a <- 10
  print(ls.str()) # print out the active environment
}
f()
## a :  num 10

Did calling f() change the value of a in the Global Environment? Let’s see:

a
## [1] 5

Nope, a is still 10.

This is a very good thing. It would be very confusing if assignment to a variable within a function were to “change the values” of variables—happening to have the same name—that were declared outside of the function’s environment.

3.5.4 Practice Exercises

  1. Starting from a new R session and an empty Global Environment, I run the following code:

    m <- 5
    f <- function(n) {
      m <- 10
      a <- m + 5
      n^2
    }

    What two items are now in my Global Environment?

  2. I then run the following code:

    g <- f(10)

    What items are now in my Global Environment? What is the value of g? What is the value of m?

  3. When f() was called, a runtime environment was created for it. By the time it got down to the line n^2, what items were in that environment, and what were their values?

3.5.5 Solutions to the Practice Exercises

  1. There are now two items in the Global Environment:
    • m: its value is 5
    • f, the fuunction I defined.
  2. Now there are three items in the Global Environment:
    • m: its value is still 5. (The m in the runtime environment is not in the “scope” of the Global Environment.)
    • f, the fuunction I defined is still there.
    • I also have g: its value is \(10^2 = 100\).
  3. The items in the runtime environment were:
    • the parameter n, with a value of 10;
    • m, with a value of 10;
    • a, with a value of 15.

  1. In R, almost all functions are called “closures.”↩︎

  2. The ellipses, which we will discuss further in Chapter 9, allow the function to be passed any arguments at all—or even none.↩︎

  3. For any function that is created in R, the enclosing environment of the function is set to be the environment that was active when the function was defined. This feature is known as lexical scoping. Many other languages use dynamic scoping, meaning that the enclosing environment is the environment that is active when the function is called. At this stage in your work with R, when you almost always create functions while working the Global Environment, it can be a bit difficult to become aware of situations when the distinction between lexical and dynamic scoping makes a practical difference. However, the difference is there and it constantly affects your work with R, especially when you use a function from an R package (see Section 3.6 for more on packages). Since the environment associated with a package is the enclosing environment for any R-function defined in that package, functions from packages behave in a standard, expected way, no matter what environment—Global or otherwise—they are called in. For a practical application of lexical scoping that is not related to packages, consult Chapter 6 of (Grolemund (2014)).↩︎