## 2.6 Subsetting with Logical Vectors

The subsetting we have seen up to now involves specifying the *indices* of the elements we would like to select from the original vector. It is also possible to say, for each element, *whether or not it is to be included* in our selection. This is accomplished by means of logical vectors.

Recall our `heights`

vector:

`heights`

```
## Scarecrow Tinman Lion Dorothy Toto Boq
## 73 70 69 60 NA 46
```

Let’s say that we want the heights of Scarecrow, Tinman and Dorothy. We can use a logical vector to do this:

```
wanted <- c(TRUE, TRUE, FALSE, TRUE, FALSE, FALSE)
heights[wanted]
```

```
## Scarecrow Tinman Dorothy
## 73 70 60
```

The `TRUE`

’s at indices 1, 2, and 4 in `wanted`

inform R that we want the heights vector at indices 1, 2 and 4. The `FALSE`

’s say: “don’t include this element!”

Subsetting can be used powerfully along with logical vectors and Boolean operators.

For example, in order to select those persons whose heights exceed a certain amount, we might say something like this:

```
#heights of some people:
people <- c(55, 64, 67, 70, 63, 72)
tall <- (people >= 70)
tall
```

`## [1] FALSE FALSE FALSE TRUE FALSE TRUE`

`people[tall]`

`## [1] 70 72`

As you can see, the `tall`

vector specifies which elements we would like to select from the `people`

vector.

We need not define the `tall`

vector along the way. It is quite common to see something like the following:

`people[people >= 70]`

`## [1] 70 72`

I like to pronounce the above as:

`people`

, where`people`

is at least 70

The word “where” in the above phrase corresponds to the subsetting operator.

Your subsetting logical vector need not have been constructed with the original vector in mind. Consider the following example:

```
age <- c(23, 21, 22, 25, 63)
height <- c(68, 67, 71, 70, 69)
age[height < 70]
```

`## [1] 23 21 63`

Here the selection is done from the `age`

vector, using a logical vector that was constructed from `height`

—another vector altogether. It concisely expresses the idea:

the ages of people whose height is less than 70

There is no limit to the complexity of selection. Consider the following:

```
age <- c(23, 21, 22, 25, 63)
height <- c(68, 67, 71, 70, 69)
likesToto <- c(TRUE, TRUE, FALSE, FALSE, TRUE)
height[age < 60 & likesToto]
```

`## [1] 68 67`

### 2.6.1 Counting

Logical subsetting provides a convenient way to *count* the elements of a vector that possess a given property. For example, to find out how many elements of `people`

are less than 70 we could say:

`length(people[people < 70])`

`## [1] 4`

### 2.6.2 Cautions about NA

You should be aware of the effect of `NA`

-values on subsetting.

`heights`

```
## Scarecrow Tinman Lion Dorothy Toto Boq
## 73 70 69 60 NA 46
```

```
tall <- (heights > 65)
tall
```

```
## Scarecrow Tinman Lion Dorothy Toto Boq
## TRUE TRUE TRUE FALSE NA FALSE
```

Since Toto’s height was missing, R can’t say whether or not he was more than 65 inches tall. Hence it assigns `NA`

to the Toto-element of the `tall`

vector.

When we subset using this vector we get an odd result:

`heights[tall]`

```
## Scarecrow Tinman Lion <NA>
## 73 70 69 NA
```

Since R doesn’t know whether or not to select Toto, it records its indecision by including an `NA`

in the result. That `NA`

, however, is not the `NA`

for Toto’s height in the vector `heights`

, so it can’t inherit the “Toto” name. Since it has no name, R presents its name as `<NA>`

.

If we try to count the number of tall persons, we get a misleading result:

`length(heights[tall])`

`## [1] 4`

We would have preferred something like:

“Three, with another one undecided.”

Counting is one those situations in which we might wish to remove `NA`

values at the start. If the vector is small we could remove them by hand, e.g.:

```
knownHeights <- heights[-5] # remove Toto
tall <- (knownHeights > 65)
length(knownHeights[tall])
```

`## [1] 3`

For longer vectors the above approach won’t be practical. Instead we may use the `is.na()`

function.

`is.na(heights)`

```
## Scarecrow Tinman Lion Dorothy Toto Boq
## FALSE FALSE FALSE FALSE TRUE FALSE
```

Then we may select those elements that are *not* `NA`

:

```
knownHeights <- heights[!is.na(heights)]
knownHeights
```

```
## Scarecrow Tinman Lion Dorothy Boq
## 73 70 69 60 46
```

`length(knownHeights[knownHeights > 65])`

`## [1] 3`

### 2.6.3 Which, Any, All

There are several functions on logical vectors that are worth keeping in your back pocket:

`which()`

`any()`

`all()`

#### 2.6.3.1 `which()`

Applied to a logical vector, the `which()`

function returns the *indices* of the vector that have the value `TRUE`

:

```
boolVec <- c(TRUE,TRUE,FALSE,TRUE)
which(boolVec)
```

`## [1] 1 2 4`

Thus if we want to know the indices of `heights`

where the heights are at least 65, then we write:

`which(heights > 65)`

```
## Scarecrow Tinman Lion
## 1 2 3
```

(Recall that height was a named vector. The logical vector `heights > 65`

inherited these names and passed them on to the result of `whihc()`

.)

Note also that Toto’s `NA`

height was ignored by `which()`

.

#### 2.6.3.2 `any()`

Is anyone more than 71 inches tall? `any()`

will tell us:

`heights`

```
## Scarecrow Tinman Lion Dorothy Toto Boq
## 73 70 69 60 NA 46
```

`any(heights > 71)`

`## [1] TRUE`

Yes: the Scarecrow is more than 71 inches tall.

We can use `any()`

along with the equality Boolean operator `==`

to determine whether or not a given value appears a a given vector:

```
vec <- c("Dorothy", "Tin Man", "Scarecrow", "Glinda")
any(vec == "Tin Man")
```

`## [1] TRUE`

`any(vec == "Wizard")`

`## [1] FALSE`

The above question occurs so frequently that R provides the `%in%`

operator as a short-cut:

`"Tin Man" %in% vec`

`## [1] TRUE`

`"Wizard" %in% vec`

`## [1] FALSE`

#### 2.6.3.3 `all()`

Is everyone more than 71 inches tall?

`all(heights > 71)`

`## [1] FALSE`

#### 2.6.3.4 NA-Caution

Is everyone more than 40 inches tall?

`all(heights > 40)`

`## [1] NA`

Everyone with a known height is taller than 40 inches, but because Toto’s height is `NA`

R can’t say whether *all* the heights are bigger than 40.

### 2.6.4 Practice Exercises

Consider the following vectors:

```
person <- c("Abe", "Bettina", "Candace", "Devadatta", "Esmeralda")
numberKids <- c(2, 1, 0, 2, 3)
yearsEducation <- c(12, 16, 13, 14, 18)
hasPets <- c(FALSE, FALSE, TRUE, TRUE, FALSE, TRUE)
```

Think of these vectors as providing information about siz people.

Write a command that produces the names of people who have more than 1 child.

Write a command that produces the numbers of children of people who have a pet.

Write a command that produces the years of education who have at least 13 years of education.

Write a command that produces the names of people who have more than one child and fewer than 15 years od education.

Write a command that produces the names of people who don’t have pets.

Write a command that produces the number of people who have pets.

Write a command that produces the number of people who don’t have pets.

Write a command that says whether or not there is someone who has more 15 years of education and at least one child, but doesn’t have any pets.

### 2.6.5 Solutions to the Practice Exercises

`person[numberKids > 1]`

`numberKids[hasPets]`

`yearsEducation[yearsEducation >= 13]`

`person[numberKids > 1 & yearsEducation < 15]`

`person[!hasPets]`

Here is one way. We’ll learn an easier way in the next section.

`length(person[hasPets])`

`## [1] 3`

Here is one way. We’ll learn an easier way in the next section.

`length(person[!hasPets])`

`## [1] 3`

`any(yearsEducation > 15 & numberKids >= 1 & !hasPets)`