9 Lists
In this Chapter we will study lists, another important data structure in R.
9.1 Introduction to Lists
So far the vectors that we have met have all been atomic, meaning that they can hold only one type of value. Hence we deal with vectors of type integer
, or of type double
, or of type character
, and so on.
A list is a special kind of vector. Like any other vector it is onedimensional, but unlike an atomic vector it can contain objects of any sort: atomic vectors, functions—even other lists! We say, therefore, that lists are heterogeneous vectors.
The most direct way to create a list is with the function list()
. Let’s make a couple of lists:
lst1 < list(name = "Dorothy", age = 12)
df < data.frame(x = c(10, 20, 30), y = letters[1:3])
lst2 < list(vowels = c("a", "e", "i", "o", "u"),
myFrame = df)
lst3 < list(nums = 10:20,
bools = c(T, F, F),
george = lst1)
Note that the elements of our three lists are not objects of a single data type. Note also that lst3
actually contains lst1
as one of its elements.
When you call list()
to create a list, you have the option to assign a name to one or more of the elements. In the code above we chose, for both of our lists, to assign a name to each element.
Let’s print out a list to the console. We’ll choose lst1
, since it’s rather small:
lst1
## $name
## [1] "Dorothy"
##
## $age
## [1] 12
Note that the name of each elements appears before the element itself is printed out, and that the names are preceded by dollar signs. This is a hint that you can access a single member of the list in a way similar to the frame$variable
format for data frames:
lst1$age
## [1] 12
You can make an empty list, too:
emptyList < list()
This is useful when you want to build up a list gradually, but you do not yet know what will go into it.
9.1.1 Practice Exercises

Make a list called
ozStuff
. The list should contain three elements: The sequence of even numbers from 4 to 100. Its name should be
lion
.  The uppercase letters of the alphabet. Its name should be
scarecrow
.  The data frame
alcohol
from the fuel package. Its name should bewizard
.
 The sequence of even numbers from 4 to 100. Its name should be

Suppose that
ozStuff
has been created in the previous problem. Describe in your own words what the following expression does:
9.1.2 Solutions to the Practice Exercises

Here’s one way to do it:
One way to describe it is as follows: the expression creates a new list consisting of all the elements of
ozStuff
together with a new element calleddorothy
(which is the string"Kansas"
), and then binds the name “ozStuff
” to that new fourelement list.
9.2 Subsetting and Accessing
You can subset lists in the same way that you subset a vector: simply use the [
subsetting operator. Let’s pick out the first two elements of lst3
:
lst3[1:2]
## $nums
## [1] 10 11 12 13 14 15 16 17 18 19 20
##
## $bools
## [1] TRUE FALSE FALSE
We get a new list consisting of the desired two elements.
Suppose we want to access just one element from lst3
: the numbers, for instance. We could try this:
justNumbers < lst3[1]
justNumbers
## $nums
## [1] 10 11 12 13 14 15 16 17 18 19 20
Now suppose that we want to access the third number in the nums
vector. You might think this would work fine:
justNumbers[3]
## $<NA>
## NULL
Wait a minute! The third number in nums
is 12: so why are we getting NA
?
Look carefully again at the printout for justNumbers
:
justNumbers
## $nums
## [1] 10 11 12 13 14 15 16 17 18 19 20
The $nums
give us the clue: justNumbers
is not just the vector nums
—in fact it’s not an atomic vector at all. It is a list whose only element is a vector with the name nums
. Another way to see this is to check the length of justNumbers
:
length(justNumbers)
## [1] 1
The fact is that the subsetting operator [
, applied to lists, always returns a list. If you want access to an individual element of a list, then you need to use the doublebracket [[
operator:
reallyJustNumbers < lst3[[1]]
reallyJustNumbers
## [1] 10 11 12 13 14 15 16 17 18 19 20
Of course if an element of a list is named, then you may also access it with the dollar sign:
lst3$nums
## [1] 10 11 12 13 14 15 16 17 18 19 20
From time to time it’s useful to “flatten out” a list into a vector of values of its elements. This is accomplished by the function unlist()
:
unlist(lst1)
## name age
## "Dorothy" "12"
As the example above shows, you have to exercise caution with unlist()
. Since unlist()
returns an atomic vector, when it encounters values of different types then it has to coerce them to be of the same type. In the competition between double
and character
types, character
wins, so you end up with a vector of strings.
9.2.1 Practice Exercises
These exercises involve the following list:
grabBag < list(letters = letters,
as.character(1:10),
df = bcscr::fuel)
Observe that the second element of this list was NOT given a name.
Describe in words what
grabBag[2:3]
is.Describe in words what
grabBag[3]
is.Describe in words what `grabBag[[3]] is.
Find two ways to access the letter
"d"
in the first element ofgrabBag
.Find a way to access the last five elements of the second element of
grabBag
.Find two ways to access the variable
speed
in the data frame ingrabBag
.
9.2.2 Solutions to the Practice Exercises
grabBag[2:3]
is a list containing two elements: the vector of whole numbers from 1 to 10 turned into strings, and the data framefuel
from the bcscr package.grabBag[3]
is a list containing just one element: the data framefuel
from the bcscr package.grabBag[[3]]
is the data framefuel
from the bcscr package.
Here are two ways:
grabBag$letters[4] grabBag[[1]][4]

Try this:
grabBag[[2]][6:10]

Here are two ways:
grabBag$df$speed grabBag[[3]]$speed
9.3 Splitting
Sometimes it is useful to split a vector or data frame into pieces according to the value of a variable. For example, from m111survey
we might like to have separate data frames for each of the three seating preferences. We can accomplish this with the split()
function:
bySeat < split(m111survey, f = m111survey$seat)
If you run the command str(bySeat)
, you find that bySeat
is a list consisting of three data frames:

1_front
: the frame of all subjects who prefer the Front; 
2_middle
: the frame of all subjects who prefer the Middle; 
3_back
: the frame of all subjects who prefer the Back.
Now you can carry on three separate analyses, working with one frame at a time.
There is a pitfall which of you should be aware. If you try to access any one of the frames by its name, you will get an error:
$1_front bySeat
## Error: unexpected numeric constant in "bySeat$1"
The reason is that variable names cannot begin with a number! You have two options, here. You could access a single frame by using the name in quotes:
bySeat[["1_front"]]
Your second option is to use the index of the element you want:
bySeat[[1]]
9.4 Returning Multiple Values
Lists combine many different sorts of objects into one object. This makes them very useful in the context of certain functions.
Consider, for example, the drunkenturtle simulation from Section 6.8:
drunkenSim < function(steps = 1000, reps = 10000, close = 0.5,
seed = NULL, table = FALSE) {
if ( !is.null(seed) ) {
set.seed(seed)
}
returns < numeric(reps)
for (i in 1:reps) {
angle < runif(steps, 0 , 2*pi)
xSteps < cos(angle)
ySteps < sin(angle)
x < cumsum(xSteps)
y < cumsum(ySteps)
dist < sqrt(x^2 + y^2)
closeReturn < (dist < 0.5)
returns[i] < sum(closeReturn)
}
if ( table ) {
cat("Here is a table of the number of close returns:\n\n")
tab < prop.table(table(returns))
print(tab)
cat("\n")
}
cat("The average number of close returns was: ",
mean(returns), ".", sep = "")
}
Suppose that we would like to store several of the results of the simulation:
 the vector of the number of close returns on each repetition;
 the table made from the closereturns vector;
 the mean number of returns.
Unfortunately a function can only return one object.
The solution to your problem is to make a list of the three objects we want, and then return the list. We can rewrite the function so as to make all output to the console optional. The function will construct the list and return it invisibly.
drunkenSimList < function(steps = 1000, reps = 10000, close = 0.5,
seed = NULL, verbose = FALSE) {
if ( !is.null(seed) ) {
set.seed(seed)
}
# get the returns:
returns < numeric(reps)
for (i in 1:reps) {
angle < runif(steps, 0 , 2*pi)
xSteps < cos(angle)
ySteps < sin(angle)
x < cumsum(xSteps)
y < cumsum(ySteps)
dist < sqrt(x^2 + y^2)
closeReturn < (dist < 0.5)
returns[i] < sum(closeReturn)
}
# compute the table and the mean:
tableReturns < table(returns)
meanReturns < mean(returns)
# handle output to console if user wants it
if ( verbose ) {
cat("Here is a table of the number of close returns:\n\n")
print(prop.table(tableReturns))
cat("\n")
cat("The average number of close returns was: ",
meanReturns, ".", sep = "")
}
# assemble the desired three items into a list
# (for conveneince, name the items)
results < list(tableReturns = tableReturns,
meanReturns = meanReturns,
returns = returns)
# return the list
invisible(results)
}
Now we can run the function simply to acquire the simulation results for later use:
simResults < drunkenSimList(seed = 3939)
We can use any of the results at any time and in any way we like:
cat("On the first ten repetitions, the number of close returns were:\n\n\t",
simResults$returns[1:10])
## On the first ten repetitions, the number of close returns were:
##
## 0 6 4 4 2 0 2 5 2 4
9.5 Iterating Over a List
Lists are onedimensional, so you can loop over them just as you would loop over a atomic vector. Sometimes this can be quite useful.
Here is a toy example. We will write a function that, when given a list of vectors, will return a vector consisting of the means of each of the vectors in the list.
means < function(vecs = list(), ...) {
n < length(vecs)
if ( n == 0 ) {
return(cat("Need some vectors to work with!"))
}
results < numeric()
for ( vec in vecs ) {
print(vec)
results < c(results, mean(vec, ...))
}
results
}
## [1] 1 2 3 4 5
## [1] 1 2 3 4 5 6 7 8 9 10
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 NA
## [1] 3.0 5.5 10.5
Another possibility—and one that will work a bit more quickly—is to iterate over the indices of the list of vectors:
means2 < function(vecs = list(), ...) {
n < length(vecs)
if ( n == 0 ) {
return(cat("Need some vectors to work with!"))
}
results < numeric(n)
for ( i in 1:n ) {
results[i] < mean(vecs[[i]], ...)
}
results
}
means2(vecs = list(vec1, vec2, vec3), na.rm = TRUE)
## [1] 3.0 5.5 10.5
9.6 A Note on Ellipses
The functions of the previous section contained a mysterious ...
argument in their definitions. This is known in R as the ellipsis argument, and it signals the possibility that one or more additional arguments may be supplied when the function is actually called.
The following function illustrates the operation of the ellipsis argument:
ellipisDemo < function(...) {
cat("I got the following arguments:\n\n")
print(list(...))
}
ellipisDemo(x = 3, y = "cat", z = FALSE)
## I got the following arguments:
##
## $x
## [1] 3
##
## $y
## [1] "cat"
##
## $z
## [1] FALSE
At this point in our study of R, ...
is useful in two ways.
9.6.1 Use #1: Passing Additional Arguments to Functions “Inside”
Look again at the code for the function means2()
:
means2 < function(vecs = list(), ...) {
n < length(vecs)
if ( n == 0 ) {
return(cat("Need some vectors to work with!"))
}
results < numeric(n)
for ( i in 1:n ) {
results[i] < mean(vecs[[i]], ...)
}
results
}
We plan to take the mean of some vectors and therefore the mean()
function will be used in the body of means2()
. However we would like the user to be able to decide how mean()
deals with NA
values. When we include the ellipsis argument in the definition of means2()
we have the option to pass its contents into mean()
, and we exercise that option in the line:
results[i] < mean(vecs[[i]], ...)
Now we can see what happens in the call:
means2(vecs = list(vec1, vec2, vec3), na.rm = TRUE)
The ellipsis argument will consist of the argument na.rm = TRUE
, hence the call to mean()
inside the loop is equivalent to:
results[i] < mean(vecs[[i]], na.rm = TRUE)
Consider, on the other hand, the call:
means2(vecs = list(vec1, vec2, vec3))
Now the ellipsis is empty. In this case the code in the loop will be equivalent to:
means2(vecs = list(vec1, vec2, vec3))
## [1] 3.0 5.5 NA
As a result, mean()
will use the default value of na.rm
, which is FALSE
. For any inputvector having NA
values, the mean will be computed as NA
.
9.6.2 Use #2: Permitting Any Number of Arguments
Another application of the ellipsis argument is in the writing of functions where the number of “primary” arguments is not determined in advance.
We have seen a few Rfunctions that can deal with any number of arguments. cat()
is an example:
cat("argument one,", "argument two,", "and as many more as you like!")
## argument one, argument two, and as many more as you like!
With the ellipsis argument we can do this sort of thing ourselves. For example, here is a function that takes any number of vectors as arguments and determines whether the vectors are all of the same length:
sameLength < function(...) {
vecs < list(...)
numVecs < length(vecs)
if ( numVecs <= 1 ) {
return(cat("Need two or more vectors."))
}
allSame < TRUE
len < length(vecs[[1]])
for ( i in 2:numVecs ) {
if ( length(vecs[[i]]) != len ) {
allSame < FALSE
break
}
}
allSame
}
We can give this function two or more vectors, as follows:
vec1 < 1:3
vec2 < 1:4
vec3 < 1:3
sameLength(vec1, vec2, vec3)
## [1] FALSE
9.7 Investigate Your Object: str()
and Lists
Let’s reconsider the Meetup Simulation from Section 6.4:
meetupSim < function(reps = 10000, table = FALSE, seed = NULL) {
if ( !is.null(seed) ) {
set.seed(seed)
}
anna < runif(reps, 0, 60)
raj < runif(reps, 0, 60)
connect < (abs(anna  raj) < 10)
if ( table ) {
cat("Here is a table of the results:\n\n")
print(table(connect))
cat("\n")
}
cat("The proportion of tims they met was ", mean(connect), ".\n", sep = "")
}
You will recall that when the user asks for a table of results, the function prints out a table that looks like this:
## Here is a table of the results:
## connect
## FALSE TRUE
## 69781 30219
There are a couple of small irritations, here:
 The name of the table (“connect”) appears in the output, even though it was a name that was given in the code internal to the function. As a name for the outputtable, it’s not the most descriptive choice. Besides, we really don’t need a name here, because have just
cat
ed out a sentence that introduces the table.  The names for the columns (
FALSE
andTRUE
) again pertain to features internal to the code of the function. The user should see more descriptive names.
In order to investigate how we might deal with these issues, let’s create a small table here:
## logicalVector
## FALSE TRUE
## 4 6
One way to deal with the columnname issues might be to isolate each table value and then repackage the values. We can access the individual tablevalues with subsetting. For example, the first value is:
tab[1]
## FALSE
## 4
Hence we could grab the values, create a vector from them, and then provide names for the vector that we like. Thus:
## did not meet met
## 4 6
Another approach—and this is the more instructive and generallyuseful procedure—is to begin by looking carefully at the structure of the problematic object:
str(tab)
## 'table' int [1:2(1d)] 4 6
##  attr(*, "dimnames")=List of 1
## ..$ logicalVector: chr [1:2] "FALSE" "TRUE"
We see that
 the table has an attribute called
dimnames

dimnames
is a list of length one.  It is a named list. The name of its only element is
logicalVector
.  The elements of this vector are the column names for the table.
If you would like to see the dimnames
attribute all by itself, you can access it with the attr()
function :
attr(tab, which = "dimnames") # "which" says which attribute you want!
## $logicalVector
## [1] "FALSE" "TRUE"
You can also use attr()
to set the values of an attribute. Here, we want dimnames
to be a list of length one that does not have a name for its sole element. The following should do the trick:
Let’s see if this worked:
tab
## did not meet met
## 4 6
It appears to have worked very nicely! Hence we may rewrite meetupSim()
as follows:
meetupSim < function(reps = 10000, table = FALSE, seed = NULL) {
if ( !is.null(seed) ) {
set.seed(seed)
}
anna < runif(reps, 0, 60)
raj < runif(reps, 0, 60)
connect < (abs(anna  raj) < 10)
if ( table ) {
cat("Here is a table of the results:\n\n")
tab < table(connect)
attr(tab, which = "dimnames") < list(c("did not meet", "met"))
print(tab)
cat("\n")
}
cat("The proportion of tims they met was ", mean(connect), ".\n", sep = "")
}
Let’s try it out:
meetupSim(reps = 100000, table = TRUE, seed = 3939)
## Here is a table of the results:
##
## did not meet met
## 69781 30219
##
## The proportion of tims they met was 0.30219.
Much better!
The moral of the story is:
Make a habit of examining your objects with the
str()
function. Combiningstr()
with your abilities to manipulate lists allows you to access and set pieces of the object in helpful ways.
Note: the dimnames
attribute for tables and matrices is so frequently used that it has its own special function for accessing and setting: dimnames()
. Other popular attributes, such as names
for a vector and levels
for a factor, also have dedicated access/set functions—names()
and levels()
respectively. But keep in mind that you can access and set the values for any attribute at all with the attr()
function.
9.7.1 Practice Exercises

Consider the following matrix:
myMat < matrix(1:24, nrow = 4) rownames(myMat) < letters[1:4] colnames(myMat) < LETTERS[1:6] myMat
## A B C D E F ## a 1 5 9 13 17 21 ## b 2 6 10 14 18 22 ## c 3 7 11 15 19 23 ## d 4 8 12 16 20 24
Find a way to change the row names of
myMAT
to “x,” “y,” “z” and “w,” using theattr()
function rather than therownames()
function.
9.7.2 Solutions to Practice Exercises

First, run
str(myMat)
. You find that it has an attribute calleddimnames
that is a list of length 2. The first element of this list is the vector of row names. Hence you need to assign new row names to this element. You can do so as follows:## A B C D E F ## x 1 5 9 13 17 21 ## y 2 6 10 14 18 22 ## z 3 7 11 15 19 23 ## w 4 8 12 16 20 24
It worked!
Exercises

We are given the following list:
lst < list(yabba = letters, dabba = list(x = LETTERS, y = 1:10), do = bcscr::m111survey)
One way to access the letter “b” in the first element of
lst
is as follows:lst$yabba[2]
## [1] "b"
Another way is:
lst[[1]][2]
## [1] "b"
For each of the following objects, find at least two ways to access it within
lst
: the vector of letters from “c” to “j”;
 the capital letter “F”;
 the vector of numbers from 1 to 10;
 the heights of the five tallest individuals in
m111survey
.

Write a function called
goodStats()
that, when given a vector of numerical values, computes the mean, median and standard deviation of the values, and returns these values in a list. The function should take two parameters:
x
: the vector of numerical values; 
...
: the ellipses, which allow the user to pass in additional arguments.
The list returned should name each of the three quantities:
 the name of the mean should be
mean
;  the name of the standard deviation should be
sd
;  the name of the median should be
median
.
Typical examples of use should look like this:
vec < 1:5 goodStats(x = vec)
## $mean ## [1] 3 ## ## $sd ## [1] 1.581139 ## ## $median ## [1] 3
vec < c(3, 7, 9, 11, NA) myStats < goodStats(x = vec, na.rm = TRUE) myStats$mean
## [1] 7.5
