9 Lists

In this Chapter we will study lists, another important data structure in R.

9.1 Introduction to Lists

So far the vectors that we have met have all been atomic, meaning that they can hold only one type of value. Hence we deal with vectors of type integer, or of type double, or of type character, and so on.

A list is a special kind of vector. Like any other vector it is one-dimensional, but unlike an atomic vector it can contain objects of any sort: atomic vectors, functions—even other lists! We say, therefore, that lists are heterogeneous vectors.

The most direct way to create a list is with the function list() . Let’s make a few lists:

lst1 <- list(
  name = "Dorothy", age = 12
)
df <- data.frame(
  x = c(10, 20, 30),
  y = letters[1:3]
)
lst2 <- list(
  vowels = c("a", "e", "i", "o", "u"),
  myFrame = df
)
lst3 <- list(
  nums = 10:20,
  bools = c(TRUE, FALSE, FALSE),
  george = lst1
)

Note that the elements of our three lists are not objects of a single data type. Note also that lst3 actually contains lst1 as one of its elements.

When you call list() to create a list, you have the option to assign a name to one or more of the elements. In the code above we chose, for both of our lists, to assign a name to each element.

Let’s print out a list to the console. We’ll choose lst1, since it’s rather small:

lst1

## $name
## [1] "Dorothy"
## 
## $age
## [1] 12

Note that the name of each elements appears before the element itself is printed out, and that the names are preceded by dollar signs. This is a hint that you can access a single member of the list in a way similar to the frame$variable format for data frames:

lst1$age

## [1] 12

You can make an empty list, too:

emptyList <- list()

This is useful when you want to build up a list gradually, but you do not yet know how long it will be.

If you know how long your list will be, then do something like this:

initialList <- vector(mode = "list", length = 5)

9.1.1 Practice Exercises

Make a list called ozStuff. The list should contain three elements:
- The sequence of even numbers from 4 to 100. Its name should be lion.
- The uppercase letters of the alphabet. Its name should be scarecrow.
- The data frame alcohol from the fuel package. Its name should be wizard.
Suppose that ozStuff has been created in the previous problem. Describe in your own words what the following expression does:
```
ozStuff <- c(ozStuff, list(dorothy = "Kansas"))
```

9.1.2 Solutions to the Practice Exercises

Here’s one way to do it:

ozStuff <- list(
  lion = seq(4, 100, by = 2),
  scarecrow = LETTERS,
  wizard = bcscr::fuel
)

One way to describe it is as follows: the expression creates a new list consisting of all the elements of ozStuff together with a new element called dorothy (which is the string "Kansas"), and then binds the name “ozStuff” to that new four-element list.

9.2 Subsetting and Accessing

You can subset lists in the same way that you subset a vector: simply use the [ sub-setting operator. Let’s pick out the first two elements of lst3:

lst3[1:2]

## $nums
##  [1] 10 11 12 13 14 15 16 17 18 19 20
## 
## $bools
## [1]  TRUE FALSE FALSE

We get a new list consisting of the desired two elements.

Suppose we want to access just one element from lst3: the numbers, for instance. We could try this:

justNumbers <- lst3[1]
justNumbers

## $nums
##  [1] 10 11 12 13 14 15 16 17 18 19 20

Now suppose that we want to access the third number in the nums vector. You might think this would work fine:

justNumbers[3]

## $<NA>
## NULL

Wait a minute! The third number in nums is 12: so why are we getting NA?

Look carefully again at the printout for justNumbers:

justNumbers

## $nums
##  [1] 10 11 12 13 14 15 16 17 18 19 20

The $nums give us the clue: justNumbers is not just the vector nums—in fact it’s not an atomic vector at all. It is a list whose only element is a vector with the name nums. Another way to see this is to check the length of justNumbers:

length(justNumbers)

## [1] 1

The fact is that the sub-setting operator [, applied to lists, always returns a list. If you want access to an individual element of a list, then you need to use the double-bracket [[ operator:

reallyJustNumbers <- lst3[[1]]
reallyJustNumbers

##  [1] 10 11 12 13 14 15 16 17 18 19 20

Of course if an element of a list is named, then you may also access it with the dollar sign:

lst3$nums

##  [1] 10 11 12 13 14 15 16 17 18 19 20

From time to time it’s useful to “flatten out” a list into a vector of values of its elements. This is accomplished by the function unlist() :

unlist(lst1)

##      name       age 
## "Dorothy"      "12"

As the example above shows, you have to exercise caution with unlist(). Since unlist() returns an atomic vector, when it encounters values of different types then it has to coerce them to be of the same type. In the competition between double and character types, character wins, so you end up with a vector of strings.

9.2.1 Practice Exercises

These exercises involve the following list:

grabBag <- list(
  letters = letters,
  as.character(1:10),
  df = bcscr::fuel
)

Observe that the second element of this list was NOT given a name.

Describe in words what grabBag[2:3] is.
Describe in words what grabBag[3] is.
Describe in words what `grabBag[[3]] is.
Find two ways to access the letter "d" in the first element of grabBag.
Find a way to access the last five elements of the second element of grabBag.
Find two ways to access the variable speed in the data frame in grabBag.

9.2.2 Solutions to the Practice Exercises

grabBag[2:3] is a list containing two elements: the vector of whole numbers from 1 to 10 turned into strings, and the data frame fuel from the bcscr package.
grabBag[3] is a list containing just one element: the data frame fuel from the bcscr package.
grabBag[[3]] is the data frame fuel from the bcscr package.
Here are two ways:
```
grabBag$letters[4]
grabBag[[1]][4]
```
Try this:
```
grabBag[[2]][6:10]
```
Here are two ways:
```
grabBag$df$speed
grabBag[[3]]$speed
```

9.3 Some Applications of Lists

9.3.1 Splitting

Sometimes it is useful to split a vector or data frame into pieces according to the value of a variable. For example, from m111survey we might like to have separate data frames for each of the three seating preferences. We can accomplish this with the split() function:

bySeat <- split(m111survey, f = m111survey$seat)

If you run the command str(bySeat), you find that bySeat is a list consisting of three data frames:

1_front: the frame of all subjects who prefer the Front;
2_middle: the frame of all subjects who prefer the Middle;
3_back: the frame of all subjects who prefer the Back.

Now you can carry on three separate analyses, working with one frame at a time.

There is a pitfall which of you should be aware. If you try to access any one of the frames by its name, you will get an error:

bySeat$1_front

## Error: unexpected numeric constant in "bySeat$1"

The reason is that variable names cannot begin with a number! You have two options, here. You could access a single frame by using the name in quotes:

bySeat[["1_front"]]

Your second option is to use the index of the element you want:

bySeat[[1]]

9.3.2 Returning Multiple Values

Lists combine many different sorts of objects into one object. This makes them very useful in the context of certain functions.

Consider, for example, the drunken-turtle simulation from Section 8.5.3:

drunkenSim <- function(steps = 1000, reps = 10000, close = 0.5, 
                       seed = NULL, table = FALSE) {
  if ( !is.null(seed) ) {
    set.seed(seed)
  }
  
  returns <- numeric(reps)
  
  for (i in 1:reps) {
  angle <- runif(steps, 0 , 2*pi)
  xSteps <- cos(angle)
  ySteps <- sin(angle)
  
  x <- cumsum(xSteps)
  y <- cumsum(ySteps)
  
  dist <- sqrt(x^2 + y^2)
  closeReturn <- (dist < 0.5)
  returns[i] <- sum(closeReturn)
  }
  
  if ( table ) {
    cat("Here is a table of the number of close returns:\n\n")
    tab <- prop.table(table(returns))
    print(tab)
    cat("\n")
  }
  cat("The average number of close returns was:  ", 
      mean(returns), ".", sep = "")
}

Suppose that we would like to store several of the results of the simulation:

the vector of the number of close returns on each repetition;
the table made from the close-returns vector;
the mean number of returns.

Unfortunately a function can only return one object.

The solution to your problem is to make a list of the three objects we want, and then return the list. We can re-write the function so as to make all output to the console optional. The function will construct the list and return it invisibly.

drunkenSimList <- function(steps = 1000, reps = 10000, close = 0.5, 
                       seed = NULL, verbose = FALSE) {
  if ( !is.null(seed) ) {
    set.seed(seed)
  }
  
  # get the returns:
  returns <- numeric(reps)
  for (i in 1:reps) {
  angle <- runif(steps, 0 , 2*pi)
  xSteps <- cos(angle)
  ySteps <- sin(angle)
  
  x <- cumsum(xSteps)
  y <- cumsum(ySteps)
  
  dist <- sqrt(x^2 + y^2)
  closeReturn <- (dist < 0.5)
  returns[i] <- sum(closeReturn)
  }
  # compute the table and the mean:
  tableReturns <- table(returns)
  meanReturns <- mean(returns)
  
  # handle output to console if user wants it
  if ( verbose ) {
    cat("Here is a table of the number of close returns:\n\n")
    print(prop.table(tableReturns))
    cat("\n")
    cat("The average number of close returns was:  ", 
      meanReturns, ".", sep = "")
  }
  
  # assemble the desired three items into a list
  # (for conveneince, name the items)
  results <- list(tableReturns = tableReturns,
                  meanReturns = meanReturns,
                  returns = returns)
  # return the list
  invisible(results)
}

Now we can run the function simply to acquire the simulation results for later use:

simResults <- drunkenSimList(seed = 3939)

We can use any of the results at any time and in any way we like:

cat("On the first ten repetitions, the number of close returns were:\n\n\t",
    simResults$returns[1:10])

## On the first ten repetitions, the number of close returns were:
## 
##   0 6 4 4 2 0 2 5 2 4

9.3.3 Storing Results in a List

Recall the Oz companions walking through a meadow, picking flowers (see Section 4.4). In a previous Practice Exercise (see Section 7.8.4), we stored the results of their meadow-walks in a data frame.

Sometimes it can be more convenient to store results in a list. Let’s modify our meadow-work to accomplish this.

Recall the flowers in the field:

flower_colors <- c("blue", "red", "pink", "crimson", "orange")

We write a helper-function to return the vector of flowers picked by a single person:

walk_meadow_vec <- function(color, wanted) {
  picking <- TRUE
  ## the following will be extended to hold the flowers picked:
  flowers_picked <- character()
  desired_count <- 0
  while (picking) {
    picked <- sample(flower_colors, size = 1)
    flowers_picked <- c(flowers_picked, picked)
    if (picked == color) desired_count <- desired_count + 1
    if (desired_count == wanted) picking <- FALSE
  }
  ## return the vector of flowers picked:
  flowers_picked
}

Now we write the function to make the list of results:

all_walk_list <- function(people, favs, numbers) {
  ## initialize a list of the required length:
  lst <- vector(mode = "list", length = length(people))
  for (i in 1:length(people)) {
    fav <- favs[i]
    number <- numbers[i]
    lst[[i]] <- walk_meadow_vec(
      color = fav,
      wanted = number
    )
  }
  ## give names:
  names(lst) <- people
  ## return the list
  lst
}

Try it out:

set.seed(2020)
all_walk_list(
  people = c("Dorothy", "Toto"),
  favs = c("blue", "orange"),
  numbers = c(4, 2)
)

## $Dorothy
##  [1] "crimson" "crimson" "blue"    "blue"    "crimson" "red"     "blue"   
##  [8] "orange"  "red"     "red"     "orange"  "red"     "pink"    "red"    
## [15] "orange"  "crimson" "red"     "crimson" "crimson" "red"     "crimson"
## [22] "orange"  "crimson" "crimson" "pink"    "red"     "red"     "pink"   
## [29] "orange"  "crimson" "orange"  "orange"  "red"     "orange"  "blue"   
## 
## $Toto
## [1] "pink"   "orange" "blue"   "orange"

9.3.4 Iterating Over a List

Lists are one-dimensional, so you can loop over them just as you would loop over a atomic vector. Sometimes this can be quite useful.

Here is a toy example. We will write a function that, when given a list of vectors, will return a vector consisting of the means of each of the vectors in the list.

means <- function(vecs = list(), ...) {
  n <- length(vecs)
  if ( n == 0 ) {
    return(cat("Need some vectors to work with!"))
  }
  results <- numeric()
  for ( vec in vecs ) {
    print(vec)
    results <- c(results, mean(vec, ...))
  }
  results
}

vec1 <- 1:5
vec2 <- 1:10
vec3 <- c(1:20, NA)
means(vecs = list(vec1, vec2, vec3), na.rm = TRUE)

## [1] 1 2 3 4 5
##  [1]  1  2  3  4  5  6  7  8  9 10
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 NA

## [1]  3.0  5.5 10.5

Another possibility—and one that will work a bit more quickly—is to iterate over the indices of the list of vectors:

means2 <- function(vecs = list(), ...) {
  n <- length(vecs)
  if ( n == 0 ) {
    return(cat("Need some vectors to work with!"))
  }
  results <- numeric(n)
  for ( i in 1:n ) {
    results[i] <- mean(vecs[[i]], ...)
  }
  results
}

means2(vecs = list(vec1, vec2, vec3), na.rm = TRUE)

## [1]  3.0  5.5 10.5

9.3.5 A Note on Ellipses

The functions of the previous sub-section contained a mysterious ... argument in their definitions. This is known in R as the ellipsis argument, and it signals the possibility that one or more additional arguments may be supplied when the function is actually called.

The following function illustrates the operation of the ellipsis argument:

ellipisDemo <- function(...) {
  cat("I got the following arguments:\n\n")
  print(list(...))
}
ellipisDemo(x = 3, y = "cat", z = FALSE)

## I got the following arguments:
## 
## $x
## [1] 3
## 
## $y
## [1] "cat"
## 
## $z
## [1] FALSE

At this point in our study of R, ... is useful in two ways.

9.3.5.1 Use #1: Passing Additional Arguments to Functions “Inside”

Look again at the code for the function means2():

means2 <- function(vecs = list(), ...) {
  n <- length(vecs)
  if ( n == 0 ) {
    return(cat("Need some vectors to work with!"))
  }
  results <- numeric(n)
  for ( i in 1:n ) {
    results[i] <- mean(vecs[[i]], ...)
  }
  results
}

We plan to take the mean of some vectors and therefore the mean() function will be used in the body of means2(). However we would like the user to be able to decide how mean() deals with NA-values. When we include the ellipsis argument in the definition of means2() we have the option to pass its contents into mean(), and we exercise that option in the line:

results[i] <- mean(vecs[[i]], ...)

Now we can see what happens in the call:

means2(vecs = list(vec1, vec2, vec3), na.rm = TRUE)

The ellipsis argument will consist of the argument na.rm = TRUE, hence the call to mean() inside the loop is equivalent to:

results[i] <- mean(vecs[[i]], na.rm = TRUE)

Consider, on the other hand, the call:

means2(vecs = list(vec1, vec2, vec3))

Now the ellipsis is empty. In this case the code in the loop will be equivalent to:

means2(vecs = list(vec1, vec2, vec3))

## [1] 3.0 5.5  NA

As a result, mean() will use the default value of na.rm, which is FALSE. For any input-vector having NA-values, the mean will be computed as NA.

9.3.5.2 Use #2: Permitting Any Number of Arguments

Another application of the ellipsis argument is in the writing of functions where the number of “primary” arguments is not determined in advance.

We have seen a few R-functions that can deal with any number of arguments. cat() is an example:

cat("argument one,", "argument two,", "and as many more as you like!")

## argument one, argument two, and as many more as you like!

With the ellipsis argument we can do this sort of thing ourselves. For example, here is a function that takes any number of vectors as arguments and determines whether the vectors are all of the same length:

sameLength <- function(...) {
  vecs <- list(...)
  numVecs <- length(vecs)
  if ( numVecs <= 1 ) {
    return(cat("Need two or more vectors."))
  }
  allSame <- TRUE
  len <- length(vecs[[1]])
  for ( i in 2:numVecs ) {
    if ( length(vecs[[i]]) != len ) {
      allSame <- FALSE
      break
    }
  }
  allSame 
}

We can give this function two or more vectors, as follows:

vec1 <- 1:3
vec2 <- 1:4
vec3 <- 1:3
sameLength(vec1, vec2, vec3)

## [1] FALSE

9.4 Investigate Your Object: `str()` and Lists

Let’s reconsider the Meetup Simulation from Section 6.5:

meetupSim <- function(reps = 10000, table = FALSE, seed = NULL) {
  if ( !is.null(seed) ) {
    set.seed(seed)
  }
  anna <- runif(reps, 0, 60)
  raj <- runif(reps, 0, 60)
  connect <- (abs(anna - raj) < 10)
  if ( table ) {
    cat("Here is a table of the results:\n\n")
    print(table(connect))
    cat("\n")
  }
  cat("The proportion of tims they met was ", mean(connect), ".\n", sep = "")
}

You will recall that when the user asks for a table of results, the function prints out a table that looks like this:

## Here is a table of the results:

## connect
## FALSE  TRUE 
## 69781 30219

There are a couple of small irritations, here:

The name of the table (“connect”) appears in the output, even though it was a name that was given in the code internal to the function. As a name for the output-table, it’s not the most descriptive choice. Besides, we really don’t need a name here, because have just cat-ed out a sentence that introduces the table.
The names for the columns (FALSE and TRUE) again pertain to features internal to the code of the function. The user should see more descriptive names.

In order to investigate how we might deal with these issues, let’s create a small table here:

logicalVector <- c(rep(TRUE, 6), rep(FALSE, 4))
tab <- table(logicalVector)
tab

## logicalVector
## FALSE  TRUE 
##     4     6

One way to deal with the column-name issues might be to isolate each table value and then repackage the values. We can access the individual table-values with sub-setting. For example, the first value is:

tab[1]

## FALSE 
##     4

Hence we could grab the values, create a vector from them, and then provide names for the vector that we like. Thus:

results <- c(tab[1], tab[2])
names(results) <- c("did not meet", "met")
results

## did not meet          met 
##            4            6

Another approach—and this is the more instructive and generally-useful procedure—is to begin by looking carefully at the structure of the problematic object:

str(tab)

##  'table' int [1:2(1d)] 4 6
##  - attr(*, "dimnames")=List of 1
##   ..$ logicalVector: chr [1:2] "FALSE" "TRUE"

We see that

the table has an attribute called dimnames
dimnames is a list of length one.
It is a named list. The name of its only element is logicalVector.
The elements of this vector are the column names for the table.

If you would like to see the dimnames attribute all by itself, you can access it with the attr() function :

attr(tab, which = "dimnames")  # "which" says which attribute you want!

## $logicalVector
## [1] "FALSE" "TRUE"

You can also use attr() to set the values of an attribute. Here, we want dimnames to be a list of length one that does not have a name for its sole element. The following should do the trick:

attr(tab, which = "dimnames") <- list(c("did not meet", "met"))

Let’s see if this worked:

tab

## did not meet          met 
##            4            6

It appears to have worked very nicely! Hence we may rewrite meetupSim() as follows:

meetupSim <- function(reps = 10000, table = FALSE, seed = NULL) {
  if ( !is.null(seed) ) {
    set.seed(seed)
  }
  anna <- runif(reps, 0, 60)
  raj <- runif(reps, 0, 60)
  connect <- (abs(anna - raj) < 10)
  if ( table ) {
    cat("Here is a table of the results:\n\n")
    tab <- table(connect)
    attr(tab, which = "dimnames") <- list(c("did not meet", "met"))
    print(tab)
    cat("\n")
  }
  cat("The proportion of tims they met was ", mean(connect), ".\n", sep = "")
}

Let’s try it out:

meetupSim(reps = 100000, table = TRUE, seed = 3939)

## Here is a table of the results:
## 
## did not meet          met 
##        69781        30219 
## 
## The proportion of tims they met was 0.30219.

Much better!

The moral of the story is:

Make a habit of examining your objects with the str() function. Combining str() with your abilities to manipulate lists allows you to access and set pieces of the object in helpful ways.

Note: the dimnames attribute for tables and matrices is so frequently used that it has its own special function for accessing and setting: dimnames(). Other popular attributes, such as names for a vector and levels for a factor, also have dedicated access/set functions—names() and levels() respectively. But keep in mind that you can access and set the values for any attribute at all with the attr() function.

9.4.1 Practice Exercises

Consider the following matrix:

myMat <- matrix(1:24, nrow = 4)
rownames(myMat) <- letters[1:4]
colnames(myMat) <- LETTERS[1:6]
myMat

##   A B  C  D  E  F
## a 1 5  9 13 17 21
## b 2 6 10 14 18 22
## c 3 7 11 15 19 23
## d 4 8 12 16 20 24

Find a way to change the row names of myMAT to “x”, “y”, “z” and “w”, using the attr() function rather than the rownames() function.

9.4.2 Solutions to Practice Exercises

First, run str(myMat). You find that it has an attribute called dimnames that is a list of length 2. The first element of this list is the vector of row names. Hence you need to assign new row names to this element. You can do so as follows:
```
attr(myMat, which = "dimnames")[[1]] <- c("x", "y", "z", "w")
myMat
```
```
##   A B  C  D  E  F
## x 1 5  9 13 17 21
## y 2 6 10 14 18 22
## z 3 7 11 15 19 23
## w 4 8 12 16 20 24
```
It worked!

Links to Class Slides

Quarto Presentations that I sometimes use in class:

Glossary

List: A heterogeneous vector; that is, a vector whose elements can be any sort of R-object.

Exercises

We are given the following list:
```
lst <- list(yabba = letters,
            dabba = list(x = LETTERS,
                         y = 1:10),
            do = bcscr::m111survey)
```
One way to access the letter “b” in the first element of lst is as follows:
```
lst$yabba[2]
```
```
## [1] "b"
```
Another way is:
```
lst[[1]][2]
```
```
## [1] "b"
```
For each of the following objects, find at least two ways to access it within lst:
- the vector of letters from “c” to “j”;
- the capital letter “F”;
- the vector of numbers from 1 to 10;
- the heights of the five tallest individuals in m111survey.
Write a function called goodStats() that, when given a vector of numerical values, computes the mean, median and standard deviation of the values, and returns these values in a list. The function should take two parameters:
- x: the vector of numerical values;
- ...: the ellipses, which allow the user to pass in additional arguments.
The list returned should name each of the three quantities:
- the name of the mean should be mean;
- the name of the standard deviation should be sd;
- the name of the median should be median.
Typical examples of use should look like this:
```
vec <- 1:5
goodStats(x = vec)
```
```
## $mean
## [1] 3
## 
## $sd
## [1] 1.581139
## 
## $median
## [1] 3
```
```
vec <- c(3, 7, 9, 11, NA)
myStats <- goodStats(x = vec, na.rm = TRUE)
myStats$mean
```
```
## [1] 7.5
```

8 Graphics

10 Basic Tidyverse Concepts