9 Lists
In this Chapter we will study lists, another important data structure in R.
9.1 Introduction to Lists
So far the vectors that we have met have all been atomic, meaning that they can hold only one type of value. Hence we deal with vectors of type integer
, or of type double
, or of type character
, and so on.
A list is a special kind of vector. Like any other vector it is one-dimensional, but unlike an atomic vector it can contain objects of any sort: atomic vectors, functions—even other lists! We say, therefore, that lists are heterogeneous vectors.
The most direct way to create a list is with the function list()
. Let’s make a few lists:
<- list(
lst1 name = "Dorothy", age = 12
)<- data.frame(
df x = c(10, 20, 30),
y = letters[1:3]
)<- list(
lst2 vowels = c("a", "e", "i", "o", "u"),
myFrame = df
)<- list(
lst3 nums = 10:20,
bools = c(TRUE, FALSE, FALSE),
george = lst1
)
Note that the elements of our three lists are not objects of a single data type. Note also that lst3
actually contains lst1
as one of its elements.
When you call list()
to create a list, you have the option to assign a name to one or more of the elements. In the code above we chose, for both of our lists, to assign a name to each element.
Let’s print out a list to the console. We’ll choose lst1
, since it’s rather small:
lst1
$name
[1] "Dorothy"
$age
[1] 12
Note that the name of each elements appears before the element itself is printed out, and that the names are preceded by dollar signs. This is a hint that you can access a single member of the list in a way similar to the frame$variable
format for data frames:
$age lst1
[1] 12
You can make an empty list, too:
<- list() emptyList
This is useful when you want to build up a list gradually, but you do not yet know how long it will be.
If you know how long your list will be, then do something like this:
<- vector(mode = "list", length = 5) initialList
9.1.1 Practice Exercises
9.2 Subsetting and Accessing
You can subset lists in the same way that you subset a vector: simply use the [
sub-setting operator. Let’s pick out the first two elements of lst3
:
1:2] lst3[
$nums
[1] 10 11 12 13 14 15 16 17 18 19 20
$bools
[1] TRUE FALSE FALSE
We get a new list consisting of the desired two elements.
Suppose we want to access just one element from lst3
: the numbers, for instance. We could try this:
<- lst3[1]
justNumbers justNumbers
$nums
[1] 10 11 12 13 14 15 16 17 18 19 20
Now suppose that we want to access the third number in the nums
vector. You might think this would work fine:
3] justNumbers[
$<NA>
NULL
Wait a minute! The third number in nums
is 12: so why are we getting NA
?
Look carefully again at the printout for justNumbers
:
justNumbers
$nums
[1] 10 11 12 13 14 15 16 17 18 19 20
The $nums
give us the clue: justNumbers
is not just the vector nums
—in fact it’s not an atomic vector at all. It is a list whose only element is a vector with the name nums
. Another way to see this is to check the length of justNumbers
:
length(justNumbers)
[1] 1
The fact is that the sub-setting operator [
, applied to lists, always returns a list. If you want access to an individual element of a list, then you need to use the double-bracket [[
operator:
<- lst3[[1]]
reallyJustNumbers reallyJustNumbers
[1] 10 11 12 13 14 15 16 17 18 19 20
Of course if an element of a list is named, then you may also access it with the dollar sign:
$nums lst3
[1] 10 11 12 13 14 15 16 17 18 19 20
From time to time it’s useful to “flatten out” a list into a vector of values of its elements. This is accomplished by the function unlist()
:
unlist(lst1)
name age
"Dorothy" "12"
As the example above shows, you have to exercise caution with unlist()
. Since unlist()
returns an atomic vector, when it encounters values of different types then it has to coerce them to be of the same type. In the competition between double
and character
types, character
wins, so you end up with a vector of strings.
9.2.1 Practice Exercises
These exercises involve the following list:
Observe that the second element of this list was NOT given a name.
9.3 Some Applications of Lists
9.3.1 Splitting
Sometimes it is useful to split a vector or data frame into pieces according to the value of a variable. For example, from m111survey
(see Data Table 7.1) we might like to have separate data frames for each of the three seating preferences. We can accomplish this with the split()
function:
<- split(m111survey, f = m111survey$seat) bySeat
If you run the command str(bySeat)
, you find that bySeat
is a list consisting of three data frames:
1_front
: the frame of all subjects who prefer the Front;2_middle
: the frame of all subjects who prefer the Middle;3_back
: the frame of all subjects who prefer the Back.
Now you can carry on three separate analyses, working with one frame at a time.
There is a pitfall which of you should be aware. If you try to access any one of the frames by its name, you will get an error:
$1_front bySeat
Error in parse(text = input): <text>:1:8: unexpected numeric constant
1: bySeat$1
^
The reason is that variable names cannot begin with a number! You have two options, here. You could access a single frame by using the name in quotes:
"1_front"]] bySeat[[
Your second option is to use the index of the element you want:
1]] bySeat[[
9.3.2 Returning Multiple Values
Lists combine many different sorts of objects into one object. This makes them very useful in the context of certain functions.
Consider, for example, the drunken-turtle simulation from Section 8.5.3:
<- function(steps = 1000, reps = 10000, close = 0.5,
drunkenSim seed = NULL, table = FALSE) {
if ( !is.null(seed) ) {
set.seed(seed)
}
<- numeric(reps)
returns
for (i in 1:reps) {
<- runif(steps, 0 , 2*pi)
angle <- cos(angle)
xSteps <- sin(angle)
ySteps
<- cumsum(xSteps)
x <- cumsum(ySteps)
y
<- sqrt(x^2 + y^2)
dist <- (dist < 0.5)
closeReturn <- sum(closeReturn)
returns[i]
}
if ( table ) {
cat("Here is a table of the number of close returns:\n\n")
<- prop.table(table(returns))
tab print(tab)
cat("\n")
}cat("The average number of close returns was: ",
mean(returns), ".", sep = "")
}
Suppose that we would like to store several of the results of the simulation:
- the vector of the number of close returns on each repetition;
- the table made from the close-returns vector;
- the mean number of returns.
Unfortunately a function can only return one object.
The solution to your problem is to make a list of the three objects we want, and then return the list. We can re-write the function so as to make all output to the console optional. The function will construct the list and return it invisibly.
<- function(steps = 1000, reps = 10000, close = 0.5,
drunkenSimList seed = NULL, verbose = FALSE) {
if ( !is.null(seed) ) {
set.seed(seed)
}
# get the returns:
<- numeric(reps)
returns for (i in 1:reps) {
<- runif(steps, 0 , 2*pi)
angle <- cos(angle)
xSteps <- sin(angle)
ySteps
<- cumsum(xSteps)
x <- cumsum(ySteps)
y
<- sqrt(x^2 + y^2)
dist <- (dist < 0.5)
closeReturn <- sum(closeReturn)
returns[i]
}# compute the table and the mean:
<- table(returns)
tableReturns <- mean(returns)
meanReturns
# handle output to console if user wants it
if ( verbose ) {
cat("Here is a table of the number of close returns:\n\n")
print(prop.table(tableReturns))
cat("\n")
cat("The average number of close returns was: ",
".", sep = "")
meanReturns,
}
# assemble the desired three items into a list
# (for conveneince, name the items)
<- list(tableReturns = tableReturns,
results meanReturns = meanReturns,
returns = returns)
# return the list
invisible(results)
}
Now we can run the function simply to acquire the simulation results for later use:
<- drunkenSimList(seed = 3939) simResults
We can use any of the results at any time and in any way we like:
cat(
"On the first ten repetitions, the number of close returns were:\n\n\t",
$returns[1:10]
simResults )
On the first ten repetitions, the number of close returns were:
0 6 4 4 2 0 2 5 2 4
9.3.3 Storing Results in a List
Recall the Oz companions walking through a meadow, picking flowers (see Section 4.4). In a previous Practice Exercise (see Section 7.8.4), we stored the results of their meadow-walks in a data frame.
Sometimes it can be more convenient to store results in a list. Let’s modify our meadow-work to accomplish this.
Recall the flowers in the field:
<- c("blue", "red", "pink", "crimson", "orange") flower_colors
We write a helper-function to return the vector of flowers picked by a single person:
<- function(color, wanted) {
walk_meadow_vec <- TRUE
picking ## the following will be extended to hold the flowers picked:
<- character()
flowers_picked <- 0
desired_count while (picking) {
<- sample(flower_colors, size = 1)
picked <- c(flowers_picked, picked)
flowers_picked if (picked == color) desired_count <- desired_count + 1
if (desired_count == wanted) picking <- FALSE
}## return the vector of flowers picked:
flowers_picked }
Now we write the function to make the list of results:
<- function(people, favs, numbers) {
all_walk_list ## initialize a list of the required length:
<- vector(mode = "list", length = length(people))
lst for (i in 1:length(people)) {
<- favs[i]
fav <- numbers[i]
number <- walk_meadow_vec(
lst[[i]] color = fav,
wanted = number
)
}## give names:
names(lst) <- people
## return the list
lst }
Try it out:
9.3.4 Iterating Over a List
Lists are one-dimensional, so you can loop over them just as you would loop over an atomic vector. Sometimes this can be quite useful.
Here is a toy example. We will write a function that, when given a list of vectors, will return a vector consisting of the means of each of the vectors in the list.
<- function(vecs = list(), ...) {
means <- length(vecs)
n if ( n == 0 ) {
return(cat("Need some vectors to work with!"))
}<- numeric(n)
results for (i in 1:n) {
<- mean(vecs[[i]], ...)
results[i]
}
results }
Applying the function:
<- 1:5
vec1 <- 1:10
vec2 <- c(1:20, NA)
vec3 means(vecs = list(vec1, vec2, vec3), na.rm = TRUE)
[1] 3.0 5.5 10.5
9.3.5 A Note on Ellipses
The functions of the previous sub-section contained a mysterious ...
argument in their definitions. This is known in R as the ellipsis argument, and it signals the possibility that one or more additional arguments may be supplied when the function is actually called.
The following function illustrates the operation of the ellipsis argument:
<- function(...) {
ellipisDemo cat("I got the following arguments:\n\n")
print(list(...))
}
Try it!
At this point in our study of R, ...
is useful in two ways.
9.3.5.1 Use #1: Passing Additional Arguments to Functions “Inside”
Look again at the code for the function means()
:
<- function(vecs = list(), ...) {
means <- length(vecs)
n if ( n == 0 ) {
return(cat("Need some vectors to work with!"))
}<- numeric(n)
results for ( i in 1:n ) {
<- mean(vecs[[i]], ...)
results[i]
}
results }
We plan to take the mean of some vectors and therefore the mean()
function will be used in the body of means()
. However we would like the user to be able to decide how mean()
deals with NA
-values. When we include the ellipsis argument in the definition of means()
we have the option to pass its contents into mean()
, and we exercise that option in the line:
<- mean(vecs[[i]], ...) results[i]
Now we can see what happens in the call:
means(vecs = list(vec1, vec2, vec3), na.rm = TRUE)
The ellipsis argument will consist of the argument na.rm = TRUE
, hence the call to mean()
inside the loop is equivalent to:
<- mean(vecs[[i]], na.rm = TRUE) results[i]
Consider, on the other hand, the call:
means(vecs = list(vec1, vec2, vec3))
Now the ellipsis is empty. In this case the code in the loop will be equivalent to:
<- mean(vecs[[i]]) results[i]
As a result, mean()
will use the default value of na.rm
, which is FALSE
. For any input-vector having NA
-values, the mean will be computed as NA
. Try it!
9.3.5.2 Use #2: Permitting Any Number of Arguments
Another application of the ellipsis argument is in the writing of functions where the number of “primary” arguments is not determined in advance.
We have seen a few R-functions that can deal with any number of arguments. cat()
is an example:
cat("argument one,", "argument two,", "and as many more as you like!")
argument one, argument two, and as many more as you like!
With the ellipsis argument we can do this sort of thing ourselves. For example, here is a function that takes any number of vectors as arguments and determines whether the vectors are all of the same length:
<- function(...) {
sameLength <- list(...)
vecs <- length(vecs)
numVecs if ( numVecs <= 1 ) {
return(cat("Need two or more vectors."))
}<- TRUE
allSame <- length(vecs[[1]])
len for ( i in 2:numVecs ) {
if ( length(vecs[[i]]) != len ) {
<- FALSE
allSame break
}
}
allSame }
We can give this function two or more vectors, as follows:
9.4 More in Depth
9.4.1 Investigate Your Object: str()
and Lists
Let’s reconsider the Meetup Simulation from Section 6.5:
<- function(reps = 10000, table = FALSE, seed = NULL) {
meetupSim if ( !is.null(seed) ) {
set.seed(seed)
}<- runif(reps, 0, 60)
anna <- runif(reps, 0, 60)
raj <- (abs(anna - raj) < 10)
connect if ( table ) {
cat("Here is a table of the results:\n\n")
print(table(connect))
cat("\n")
}cat("The proportion of tims they met was ", mean(connect), ".\n", sep = "")
}
You will recall that when the user asks for a table of results, the function prints out a table that looks like this:
Here is a table of the results:
connect
FALSE TRUE
69781 30219
There are a couple of small irritations, here:
- The name of the table (“connect”) appears in the output, even though it was a name that was given in the code internal to the function. As a name for the output-table, it’s not the most descriptive choice. Besides, we really don’t need a name here, because have just
cat
-ed out a sentence that introduces the table. - The names for the columns (
FALSE
andTRUE
) again pertain to features internal to the code of the function. The user should see more descriptive names.
In order to investigate how we might deal with these issues, let’s create a small table here:
<- c(rep(TRUE, 6), rep(FALSE, 4))
logicalVector <- table(logicalVector)
tab tab
logicalVector
FALSE TRUE
4 6
One way to deal with the column-name issues might be to isolate each table value and then repackage the values. We can access the individual table-values with sub-setting. For example, the first value is:
1] tab[
FALSE
4
Hence we could grab the values, create a vector from them, and then provide names for the vector that we like. Thus:
<- c(tab[1], tab[2])
results names(results) <- c("did not meet", "met")
results
did not meet met
4 6
Another approach—and this is the more instructive and generally-useful procedure—is to begin by looking carefully at the structure of the problematic object:
str(tab)
'table' int [1:2(1d)] 4 6
- attr(*, "dimnames")=List of 1
..$ logicalVector: chr [1:2] "FALSE" "TRUE"
We see that
- the table has an attribute called
dimnames
dimnames
is a list of length one.- It is a named list. The name of its only element is
logicalVector
. - The elements of this vector are the column names for the table.
If you would like to see the dimnames
attribute all by itself, you can access it with the attr()
function :
attr(tab, which = "dimnames") # "which" says which attribute you want!
$logicalVector
[1] "FALSE" "TRUE"
You can also use attr()
to set the values of an attribute. Here, we want dimnames
to be a list of length one that does not have a name for its sole element. The following should do the trick:
attr(tab, which = "dimnames") <- list(c("did not meet", "met"))
Let’s see if this worked:
tab
did not meet met
4 6
It appears to have worked very nicely! Hence we may rewrite meetupSim()
as follows:
<- function(reps = 10000, table = FALSE, seed = NULL) {
meetupSim if ( !is.null(seed) ) {
set.seed(seed)
}<- runif(reps, 0, 60)
anna <- runif(reps, 0, 60)
raj <- (abs(anna - raj) < 10)
connect if ( table ) {
cat("Here is a table of the results:\n\n")
<- table(connect)
tab attr(tab, which = "dimnames") <- list(c("did not meet", "met"))
print(tab)
cat("\n")
}cat("The proportion of tims they met was ", mean(connect), ".\n", sep = "")
}
Let’s try it out:
meetupSim(reps = 100000, table = TRUE, seed = 3939)
Here is a table of the results:
did not meet met
69781 30219
The proportion of tims they met was 0.30219.
Much better!
Make a habit of examining your objects with the str()
function. Combining str()
with your abilities to manipulate lists allows you to access and set pieces of the object in helpful ways.
The dimnames
attribute for tables and matrices is so frequently used that it has its own special function for accessing and setting: dimnames()
. Other popular attributes, such as names
for a vector and levels
for a factor, also have dedicated access/set functions—names()
and levels()
respectively. But keep in mind that you can access and set the values for any attribute at all with the attr()
function.
9.4.2 Practice Exercises
- Consider the following matrix:
<- matrix(1:24, nrow = 4)
myMat rownames(myMat) <- letters[1:4]
colnames(myMat) <- LETTERS[1:6]
myMat
A B C D E F
a 1 5 9 13 17 21
b 2 6 10 14 18 22
c 3 7 11 15 19 23
d 4 8 12 16 20 24
Find a way to change the row names of myMAT
to “x”, “y”, “z” and “w”, using the attr()
function rather than the rownames()
function.
9.4.3 Solutions to Practice Exercises
- First, run
str(myMat)
. You find that it has an attribute calleddimnames
that is a list of length 2. The first element of this list is the vector of row names. Hence you need to assign new row names to this element. You can do so as follows:
attr(myMat, which = "dimnames")[[1]] <- c("x", "y", "z", "w")
myMat
A B C D E F
x 1 5 9 13 17 21
y 2 6 10 14 18 22
z 3 7 11 15 19 23
w 4 8 12 16 20 24
It worked!
Links to Class Slides
Quarto Presentations that I sometimes use in class:
Glossary
- List
-
A heterogeneous vector; that is, a vector whose elements can be any sort of R-object.
Exercises
Exercise 1
We are given the following list:
<- list(
lst yabba = letters,
dabba = list(
x = LETTERS,
y = 1:10
),do = bcscr::m111survey
)
One way to access the letter “b” in the first element of lst
is as follows:
$yabba[2] lst
[1] "b"
Another way is:
1]][2] lst[[
[1] "b"
For each of the following objects, find at least two ways to access it within lst
:
- the vector of letters from “c” to “j”;
- the capital letter “F”;
- the vector of numbers from 1 to 10;
- the heights of the five tallest individuals in
m111survey
(see Data Table 7.1).
Exercise 2
Write a function called goodStats()
that, when given a vector of numerical values, computes the mean, median and standard deviation of the values, and returns these values in a list. The function should take two parameters:
x
: the vector of numerical values;...
: the ellipses, which allow the user to pass in additional arguments.
The list returned should name each of the three quantities: * the name of the mean should be mean
; * the name of the standard deviation should be sd
; * the name of the median should be median
.
Typical examples of use should look like this:
<- 1:5
vec goodStats(x = vec)
$mean
[1] 3
$sd
[1] 1.581139
$median
[1] 3
<- c(3, 7, 9, 11, NA)
vec <- goodStats(x = vec, na.rm = TRUE)
myStats $mean myStats
[1] 7.5