2.6 Subsetting with Logical Vectors
The subsetting we have seen up to now involves specifying the indices of the elements we would like to select from the original vector. It is also possible to say, for each element, whether or not it is to be included in our selection. This is accomplished by means of logical vectors.
Recall our heights
vector:
heights
## Scarecrow Tinman Lion Dorothy Toto Boq
## 73 70 69 60 NA 46
Let’s say that we want the heights of Scarecrow, Tinman and Dorothy. We can use a logical vector to do this:
<- c(TRUE, TRUE, FALSE, TRUE, FALSE, FALSE)
wanted heights[wanted]
## Scarecrow Tinman Dorothy
## 73 70 60
The TRUE
’s at indices 1, 2, and 4 in wanted
inform R that we want the heights vector at indices 1, 2 and 4. The FALSE
’s say: “don’t include this element!”
Subsetting can be used powerfully along with logical vectors and Boolean operators.
For example, in order to select those persons whose heights exceed a certain amount, we might say something like this:
#heights of some people:
<- c(55, 64, 67, 70, 63, 72)
people <- (people >= 70)
tall tall
## [1] FALSE FALSE FALSE TRUE FALSE TRUE
people[tall]
## [1] 70 72
As you can see, the tall
vector specifies which elements we would like to select from the people
vector.
We need not define the tall
vector along the way. It is quite common to see something like the following:
>= 70] people[people
## [1] 70 72
I like to pronounce the above as:
people
, wherepeople
is at least 70
The word “where” in the above phrase corresponds to the subsetting operator.
Your subsetting logical vector need not have been constructed with the original vector in mind. Consider the following example:
<- c(23, 21, 22, 25, 63)
age <- c(68, 67, 71, 70, 69)
height < 70] age[height
## [1] 23 21 63
Here the selection is done from the age
vector, using a logical vector that was constructed from height
—another vector altogether. It concisely expresses the idea:
the ages of people whose height is less than 70
There is no limit to the complexity of selection. Consider the following:
<- c(23, 21, 22, 25, 63)
age <- c(68, 67, 71, 70, 69)
height <- c(TRUE, TRUE, FALSE, FALSE, TRUE)
likesToto < 60 & likesToto] height[age
## [1] 68 67
2.6.1 Counting
Logical subsetting provides a convenient way to count the elements of a vector that possess a given property. For example, to find out how many elements of people
are less than 70 we could say:
length(people[people < 70])
## [1] 4
2.6.2 Cautions about NA
You should be aware of the effect of NA
-values on subsetting.
heights
## Scarecrow Tinman Lion Dorothy Toto Boq
## 73 70 69 60 NA 46
<- (heights > 65)
tall tall
## Scarecrow Tinman Lion Dorothy Toto Boq
## TRUE TRUE TRUE FALSE NA FALSE
Since Toto’s height was missing, R can’t say whether or not he was more than 65 inches tall. Hence it assigns NA
to the Toto-element of the tall
vector.
When we subset using this vector we get an odd result:
heights[tall]
## Scarecrow Tinman Lion <NA>
## 73 70 69 NA
Since R doesn’t know whether or not to select Toto, it records its indecision by including an NA
in the result. That NA
, however, is not the NA
for Toto’s height in the vector heights
, so it can’t inherit the “Toto” name. Since it has no name, R presents its name as <NA>
.
If we try to count the number of tall persons, we get a misleading result:
length(heights[tall])
## [1] 4
We would have preferred something like:
“Three, with another one undecided.”
Counting is one those situations in which we might wish to remove NA
values at the start. If the vector is small we could remove them by hand, e.g.:
<- heights[-5] # remove Toto
knownHeights <- (knownHeights > 65)
tall length(knownHeights[tall])
## [1] 3
For longer vectors the above approach won’t be practical. Instead we may use the is.na()
function.
is.na(heights)
## Scarecrow Tinman Lion Dorothy Toto Boq
## FALSE FALSE FALSE FALSE TRUE FALSE
Then we may select those elements that are not NA
:
<- heights[!is.na(heights)]
knownHeights knownHeights
## Scarecrow Tinman Lion Dorothy Boq
## 73 70 69 60 46
length(knownHeights[knownHeights > 65])
## [1] 3
2.6.3 Which, Any, All
There are several functions on logical vectors that are worth keeping in your back pocket:
which()
any()
all()
2.6.3.1 which()
Applied to a logical vector, the which()
function returns the indices of the vector that have the value TRUE
:
<- c(TRUE,TRUE,FALSE,TRUE)
boolVec which(boolVec)
## [1] 1 2 4
Thus if we want to know the indices of heights
where the heights are at least 65, then we write:
which(heights > 65)
## Scarecrow Tinman Lion
## 1 2 3
(Recall that height was a named vector. The logical vector heights > 65
inherited these names and passed them on to the result of whihc()
.)
Note also that Toto’s NA
height was ignored by which()
.
2.6.3.2 any()
Is anyone more than 71 inches tall? any()
will tell us:
heights
## Scarecrow Tinman Lion Dorothy Toto Boq
## 73 70 69 60 NA 46
any(heights > 71)
## [1] TRUE
Yes: the Scarecrow is more than 71 inches tall.
We can use any()
along with the equality Boolean operator ==
to determine whether or not a given value appears a a given vector:
<- c("Dorothy", "Tin Man", "Scarecrow", "Glinda")
vec any(vec == "Tin Man")
## [1] TRUE
any(vec == "Wizard")
## [1] FALSE
The above question occurs so frequently that R provides the %in%
operator as a short-cut:
"Tin Man" %in% vec
## [1] TRUE
"Wizard" %in% vec
## [1] FALSE
2.6.3.3 all()
Is everyone more than 71 inches tall?
all(heights > 71)
## [1] FALSE
2.6.3.4 NA-Caution
Is everyone more than 40 inches tall?
all(heights > 40)
## [1] NA
Everyone with a known height is taller than 40 inches, but because Toto’s height is NA
R can’t say whether all the heights are bigger than 40.
2.6.4 Practice Exercises
Consider the following vectors:
<- c("Abe", "Bettina", "Candace", "Devadatta", "Esmeralda")
person <- c(2, 1, 0, 2, 3)
numberKids <- c(12, 16, 13, 14, 18)
yearsEducation <- c(FALSE, FALSE, TRUE, TRUE, FALSE, TRUE) hasPets
Think of these vectors as providing information about siz people.
Write a command that produces the names of people who have more than 1 child.
Write a command that produces the numbers of children of people who have a pet.
Write a command that produces the years of education who have at least 13 years of education.
Write a command that produces the names of people who have more than one child and fewer than 15 years od education.
Write a command that produces the names of people who don’t have pets.
Write a command that produces the number of people who have pets.
Write a command that produces the number of people who don’t have pets.
Write a command that says whether or not there is someone who has more 15 years of education and at least one child, but doesn’t have any pets.
2.6.5 Solutions to the Practice Exercises
person[numberKids > 1]
numberKids[hasPets]
yearsEducation[yearsEducation >= 13]
person[numberKids > 1 & yearsEducation < 15]
person[!hasPets]
Here is one way. We’ll learn an easier way in the next section.
length(person[hasPets])
## [1] 3
Here is one way. We’ll learn an easier way in the next section.
length(person[!hasPets])
## [1] 3
any(yearsEducation > 15 & numberKids >= 1 & !hasPets)