2 Vectors

This Chapter gets you started officially with R. While the theme is vectors, the most important data structure in R, we’ll learn also about variables and variable names, vector types, reserved words, assignment and many of R’s basic operators.

2.1 What is a Vector?

If you have heard of vectors before in mathematics, you might think of a vector as something that has a magnitude and a direction, and that can be represented by a sequence of numbers. In its notion of a vector, R keeps the idea of a sequence but discards magnitude and direction. The notion of “numbers” isn’t even necessary.

For R, a vector is simply a sequence of elements. There are two general sort of vectors:

atomic vectors that come in one of six forms called vector types;
non-atomic vectors, called lists, whose elements can be any sort of R-object at all.

For now we’ll just study atomic vectors. Let’s make a few vectors, as examples.

We can make a vector of numbers using the c() function. Try this:

You can think of c as standing for “combine.” c() takes its arguments, all of which are separated by commas, and combines them to make a vector.

If you closely examine the output from running the previous code, you’ll notice that R printed out all of the numerical values in the vector to three decimal places, which happened to be the largest number of decimal places we assigned to any of the numbers that made up numVec. You’ll also notice the numbers in brackets at the beginning of the lines. Each number represents the position within the vector occupied by the first element of the vector that is printed on the line. The position of an element in a vector is called its index. Reporting the indices of leading elements helps you locate particular elements in the output.

2.1.1 Types of Atomic Vectors

The numbers in numVec are what programmers call double-precision numbers. You can verify this for yourself with the typeof() function:

typeof(numVec)

[1] "double"

The typeof() function returns the type of any object in R. As far as vectors are concerned, there are six possible types, of which we will deal with only four:

double
integer
character
logical

Let’s look at examples of the other types. Here is a vector of type integer:

intVec <- c(3L, 17L, -22L, 45L)
intVec

[1]   3  17 -22  45

The L after each number signifies to R that the number should be stored in memory as an integer, rather than in double-precision format. Officially, the type is integer:

typeof(intVec)

[1] "integer"

You should know that if you left off one or more of the L’s, then R would create a vector of type double:

numVec2 <- c(3, 17, -22, 45)
typeof(numVec2)

[1] "double"

We won’t work much with integer-type vectors, but you’ll see them out in the wild.

We can also make vectors out of pieces of text called strings: these are called character vectors. As noted in the previous chapter, we use quotes to delimit strings:

strVec <- c(
  "Brains", "are", "not", "the", "best", 
  "things", "in", "the", "world", "93.2"
)
strVec

 [1] "Brains" "are"    "not"    "the"    "best"   "things" "in"     "the"   
 [9] "world"  "93.2"

typeof(strVec)

[1] "character"

Notice that "93.2" makes a string, not a number.

The last type of vectors to consider are the logical vectors. Here is an example:

logVec <- c(TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE)
logVec

[1]  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE

In order to represent a logical value you use:

TRUE to represent truth;
FALSE to represent falsity.

You need all-caps: if you try anything else—like the following—you get an error (try it):

2.1.2 Coercion

What would happen if you tried to represent falsity with the string "false"?

newVector <- c(TRUE, "false")
newVector

[1] "TRUE"  "false"

newVector is not a logical vector. Check it out:

typeof(newVector)

[1] "character"

In order to understand what just happened here, you must recall that all of the elements of an atomic vector have to be of the same type. If the c() function is presented with values of different types, then R follows a set of internal rules to coerce some of the values to a new type in such a way that all resulting values are of the same type. You don’t need to know all of the coercion rules, but it’s worth noting that

character beats double,
which in turn beats integer,
which in in turn beats logical.

The following examples show this:

typeof(c("one", 1, 1L, TRUE))

[1] "character"

typeof(c(1, 1L, TRUE))

[1] "double"

typeof(c(1L, TRUE))

[1] "integer"

Automatic coercion can be convenient in some circumstances, but in others it can give unexpected results. It’s best to keep track of what types you are dealing with and to exercise caution when combining values to make new vectors.

You can also coerce vectors “manually” with the functions:

as.numeric() ;
as.integer() ;
as.character() ;
as.logical() .

Here are some examples:

numVec <- c(3, 2.5, -7.32, 0)
as.character(numVec)

[1] "3"     "2.5"   "-7.32" "0"

as.integer(numVec)

[1]  3  2 -7  0

as.logical(numVec)

[1]  TRUE  TRUE  TRUE FALSE

Note that in coercion from numerical to logical, the number 0 becomes FALSE and all non-zero numbers become TRUE.

2.1.3 Combining Vectors

You can combine vectors you have already created to make new, bigger ones:

numVec1 <- c(5, 3, 10)
numVec2 <- c(1, 2, 3, 4, 5, 6)
numCombined <- c(numVec1, numVec2)
numCombined

[1]  5  3 10  1  2  3  4  5  6

You can see here that vectors are different from sets: they are allowed to repeat the same value in different indices, as we see in the case of the 3’s above.

2.1.4 NA Values

Consider the following vector, which we may think of as recording the heights of people, in inches:

heights <- c(72, 70, 69, 58, NA, 45)

The NA in the fifth position of the vector is a special value that may be considered to mean “Not Assigned.” We can use it to say that a value was not recorded or has gone missing for some reason. R will often use it to say (more or less) that it cannot do what we asked. For example, consider:

as.numeric("four")

Warning: NAs introduced by coercion

[1] NA

R is not programmed to transform the string "four" to any particular number, so it coerces the string to NA (and issues a warning in case you did something you didn’t intend to do).

2.1.5 “Everything in R is a Vector”

Some folks say that everything in R is a vector. That’s a bit of an exaggeration but it’s remarkably close to the truth.

And yet it seems implausible. What about the elements of an atomic vector, for instance? A single element doesn’t look at all like a vector: it’s a value, not a sequence of values.

Or so we might think. But really, in R there are no “single values” that can exist by themselves. Consider, for instance, what we think of as the number 17:

[1] 17

See the [1] in front, in the output above? It indicates that the line begins with the first element of a vector. So 17 doesn’t exist on its own: it exists a vector of type double—a vector of length 1.

Even NA is, all along, a vector of length 1. You can see by running the code below:

NA is of type logical. See for yourself:

Note that even the type of NA evaluates, in R, to a vector: a character vector of length 1 whose only element is the string “logical”!

2.1.6 Named Vectors

The elements of a vector can have names, if we like:

ages <- c(Bettina = 32, Chris = 64, Ramesh = 101)
ages

Bettina   Chris  Ramesh 
     32      64     101

What is the type of this named vector? Let’s find out:

typeof(ages)

[1] "double"

Having names doesn’t keep the vector from being a vector of type double: it has to be double because its elements are of type double.

We can name the elements of a vector when we create it with c(), or we can name them later on. One way to do this is with the names() function:

names(heights) <- c("Scarecrow", "Tinman", "Lion", "Dorothy", "Toto", "Boq")
heights

Scarecrow    Tinman      Lion   Dorothy      Toto       Boq 
       72        70        69        58        NA        45

2.1.7 Special Character Vectors

R comes with two handy, predefined character vectors. Run this code:

We will make use of them from time to time.

2.1.8 Length of Vectors

The length() function tells us how many elements a vector has:

length(heights)

[1] 6

2.1.9 Practice Exercises

Tip 2.1: Combining Two Vectors

Problem
Solution

Consider the following vector:

upperLower <- c(LETTERS, letters)

What should the length of upperLower be? Check you answer using the length() function.

There are 26 letters, so the length of upperlower should be $2 \times 26 = 52$. Let’s check:

length(upperLower)

[1] 52

Tip 2.2: Results of Coercion

Problem
Solution

True or False: c("a", 2, TRUE) yields a vector of length three consisting of the string "a", the number 2 and the logical value TRUE.

You can use the following code-field to check your guess:

False! False! The resulting vector has to be atomic—all of its elements must be the same data type. One of the elements in the starting-vector is a string, so all elements will be coerced to be of type character:

c("a", 2, TRUE)

[1] "a"    "2"    "TRUE"

Tip 2.3: Coercing to a Number

Problem
Solution

The function as.numeric() tries to coerce its input into numbers. How well can it pick out the “numbers” in strings. Try the following calls. When did as.numeric() find the numbers that was probably intended?

as.numeric("3.214")
as.numeric("3L")
as.numeric("fifty")
as.numeric("10 + 3")
as.numeric("3.25e-3")  # scientific notation:  3.25 times 10^(-2)
as.numeric("31,245")

Try the calls here:

as.numeric() isn’t very smart: it picked out the number in "3.214" and 3.25e-3, but in the other cases it returned NA.

2.2 Constructing Patterned Vectors

Quite often we need to make lengthy vectors that follow simple patterns. R has a few functions to assist us in these tasks.

2.2.1 Sequencing

Consider the seq() function:

The default value of the parameter by is 1, so we could get the same thing with:

Further reduction in typing may be achieved as long as we remember the order in which R expects the parameters (from before to, then by if supplied):

Some more complex examples:

R will go up to the to value, but not past it:

Negative steps are fine:

The colon operator : is a convenient abbreviation for seq:

If the from number is greater than the to number the step for the colon operator is -1:

2.2.2 Repeating

With rep() we may repeat a given vector as many times as we like:

We can apply rep() to a vector of length greater than 1:

rep() applies perfectly well to character-vectors:

rep() also takes an each parameter that determines how many times each element of the given vector will be repeated before the times parameter is applied. This is best illustrated with an example:

vec <- c(7, 3, 4)
rep(vec, each = 2, times = 3)

 [1] 7 7 3 3 4 4 7 7 3 3 4 4 7 7 3 3 4 4

If we combine seq() and rep() we can create fairly complex patterns concisely:

vec <- seq(5, -3, -2)
rep(vec, each = 2, times = 2)

 [1]  5  5  3  3  1  1 -1 -1 -3 -3  5  5  3  3  1  1 -1 -1 -3 -3

In order to create fifty 10’s followed by fifty 30’s followed by fifty 50’s I would write the following code. (Try it if you want to verify what it produces.)

2.2.3 Practice Exercises

Tip 2.4: Repeating with rep()

Problem
Solution

Use rep() to make the following vector:

[1] "Kansas" "Kansas" "Kansas" "Kansas" "Kansas"

Also, use rep() to make this vector:

[1]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE

Here’s how to make the first vector:

rep("Kansas", times = 5)

Here’s how to make the second one:

rep(c(TRUE, FALSE), times = 4)

Tip 2.5: Patterned Vectors with seq()

Problem
Solution

Use seq() to make the following vector:

[1]  5  8 11 14 17 20 23 26

Also, use seq() to make all of the multiples of 4, beginning with 8 and going down to -32. Your vector will print out like this:

 [1]   4   0  -4  -8 -12 -16 -20 -24 -28 -32

Here’s how to make the first vector:

seq(5, 26, by = 3)

Here’s how to make the second one:

seq(8, -32, by = -4)

Tip 2.6: Patterned Vectors with the Colon Operator

Problem
Solution

Use the colon operator to make all of the whole numbers from 10 to 20.

Use the colon operator again to make all of the whole numbers from 10 to -30.

Here’s how to make the first vector

10:20

Here’s how to make the second one:

10:-30

Tip 2.7: From 1 to the Length of a Given Vector

Problem
Solution

You have a vector named myVec that is of length at least one. How could you use the colon operator and the length() function to make all of the whole numbers from 1 up to the length of myVec?

This will do:

1:length(myVec)

For example, if myVec is:

myVec <- c("hello", "there", "Dorothy") # length is 3

Then you get:

1:length(myVec)

[1] 1 2 3

On the other hand, suppose that myVec is of length 0:

## this makes a numeric vector with no elements:
myVec <- numeric()

Then the expression would return the same as 1:0:

1:length(myVec)

[1] 1 0

It goes from 1 down to the length of myVec!

Tip 2.8: rep() and seq() Together

Problem
Solution

Use rep() and seq() together to make the following vector:

 [1]  2  3  4  5  6  7  8  9 10  2  3  4  5  6  7  8  9 10  2  3  4  5  6  7  8
[26]  9 10

Also, use rep() and seq() together to make the following vector:

 [1]  2  2  2  3  3  3  4  4  4  5  5  5  6  6  6  7  7  7  8  8  8  9  9  9 10
[26] 10 10

Here’s how to get the first vector:

rep(seq(2, 10), times = 3)

Here’s how to get the second one:

rep(seq(2, 10), each = 3)

Tip 2.9: More About rep()

Problem
Solution

It tells you that the first argument of rep() is the vector that you want to repeat, and that it’s called x. It goes on to say that times is:

“an integer-valued vector giving the (non-negative) number of times to repeat each element if of length length(x), or to repeat the whole vector if of length 1.”

Using this information to predict what the following expression returns (and then run it to check your guess):

Here’s how the expression works:

seq(10, 100, by = 10) would give the numbers 10, 20, 30, and so on up to 100, a vector of length 10.
When you rep() this with the argument times = 1:10, it’s the same as saying times = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), so:
- the 10 gets written 1 time;
- the 20 gets written 2 times;
- the 30 gets written 3 times,
- and so on until …
- the 100 gets written 10 times.

2.3 Subsetting Vectors

Quite often we need to select one or more elements from a vector. The subsetting operator [ allows us to do this.

Recall the vector heights:

heights

Scarecrow    Tinman      Lion   Dorothy      Toto       Boq 
       72        70        69        58        NA        45

If we want the fourth element, we ask for it with the subsetting operator like this:

heights[4]

Dorothy 
     58

If we want two or more elements, then we specify their indices in a vector. Thus, to get the first and fifth elements, we might do this:

desired <- c(1,5)
heights[desired]

Scarecrow      Toto 
       72        NA

We could also ask for them directly:

heights[c(1,5)]

Scarecrow      Toto 
       72        NA

Negative numbers are significant in subsetting:

heights[-2] #select all but second element

Scarecrow      Lion   Dorothy      Toto       Boq 
       72        69        58        NA        45

heights[-c(1,3)]  # all but first and third

 Tinman Dorothy    Toto     Boq 
     70      58      NA      45

If you specify a nonexistent index, you get NA, the reasonable result:

heights[7]

<NA> 
  NA

Patterned vectors are quite useful for subsetting. If you want the first three elements of heights, you don’t have to type heights[c(1,2,3)]. Instead you can just say:

heights[1:3]

Scarecrow    Tinman      Lion 
       72        70        69

The following gives the same as heights:

heights[1:length(heights)]

Scarecrow    Tinman      Lion   Dorothy      Toto       Boq 
       72        70        69        58        NA        45

If you desire to quickly provide names for a vector, subsetting can help. Try this:

If a vector has names we can refer to its elements using the subsetting operator and those names:

heights["Tinman"]

Tinman 
    70

heights[c("Scarecrow", "Boq")]

Scarecrow       Boq 
       72        45

Finally, we can use subsetting to modify parts of a vector. For example, Dorothy’s height is reported as:

heights["Dorothy"]

Dorothy 
     58

If Dorothy grows two inches, then we can modify her height as follows:

heights["Dorothy"] <- 60

We can replace more than one element, of course. Thus:

heights[c("Scarecrow", "Boq")] <- c(73, 46)

The subset of indices may be as complex as you like. Try this:

In the above example, seq(2,6,2) identified 2, 4 and 6 as the indices of elements of vec that were to be replaced by the corresponding elements of c(100, 200, 300).

We can even use subsetting to rearrange the elements of a vector. Try the example below:

2.3.1 Practice Exercises

Tip 2.10: Basic Subsetting Drills

Problem
Solution

In this exercise we’ll work with:

practice_vector <- c(4, 3, 7, 10, 5, 3, 8)

You would like to select:

the fifth element of practice_vector;
the third and sixth elements of practice_vector;
the first, second, third and fourth elements of practice_vector;
all except the fourth element of practice_vector;
all except the fourth and sixth elements of practice_vector;
the even-numbered elements of practice_vector.

You can work in the code-field below:

The fifth element of practice_vector:

practice_vector[5]

[1] 5

The third and sixth elements of practice_vector:

practice_vector[c(3, 6)]

[1] 7 3

The first, second, third and fourth elements of practice_vector:

practice_vector[1:4]

[1]  4  3  7 10

All except the fourth element of practice_vector:

practice_vector[-4]

[1] 4 3 7 5 3 8

All except the fourth and sixth elements of practice_vector:

practice_vector[-c(4, 6)]

[1] 4 3 7 5 8

The even-numbered elements of practice_vector:

practice_vector[c(2, 4, 6)]

[1]  3 10  3

Tip 2.11: Reversing Our Practice Vector

Problem
Solution

In this exercise we continue to work with:

practice_vector <- c(4, 3, 7, 10, 5, 3, 8)

How could you reverse the elements of practice_vector so as to get:

[1]  8  3  5 10  7  3  4

By now you have probably noticed that practice_vector has seven elements, so you could specify the numbers from 7 down to 1:

practice_vector[c(1, 2, 3, 4, 5, 6, 7)]

Alternatively, the colon operator helps you to be more concise:

practice_vector[7:1]

Tip 2.12: Reversing an Arbitrary Vector

Problem
Solution

Suppose that you want to reverse the elements of the vector mystery_vector, but you don’t know how many elements it has. You only know that it has at least one element. How can you accomplish this?

In the previous exercise we reversed practice_vector with:

practice_vector[7:1]

Since we don’t know how many elements mystery_vector has, we can ask for it with the length() function:

mystery_vector[length(mystery_vector):1]

Note: In the previous exercise we could have done the same with practice_vector:

practice_vector[length(practice_vector):1]

[1]  8  3  5 10  7  3  4

Tip 2.13: Selecting All the Even Elements from an Arbitrary Vector

Problem
Solution

Suppose that you want to select all of the even-numbered elements of the vector mystery_vector, but you don’t know how many elements it has. You only know that it has at least two elements. How can you accomplish this?

This will do:

mystery_vector[seq(2, length(mystery_vector), by = 2)]

For example, if mystery_vector were all of the lowercase letters then we could write:

letters[seq(2, length(letters), by = 2)]

 [1] "b" "d" "f" "h" "j" "l" "n" "p" "r" "t" "v" "x" "z"

Tip 2.14: Replacing Elements

Problem
Solution

In this exercise we use:

another_vector <- c(10, 7, 10, -4, 10, 12, 10, 18)

Replace the first 10 of another_vector with 1, the second with 2, and so on.

We could do this:

another_vector[c(1, 3, 5, 7)] <- c(1, 2, 3, 4)

However, observing that the 10s occur precisely in the odd-numbered places, we could also solve the problem more generally:

odd_elements <- seq(1, length(another_vector), by = 2)
another_vector[odd_elements] <- 1:length(odd_elements)

Either way, the result is:

another_vector

[1]  1  7  2 -4  3 12  4 18

2.4 More on Logical Vectors

Run the following expression:

We constructed it with the “less-than” operator <. You can think of it as saying 13 is less than 20, which is a true statement, and sure enough, R evaluates the expression 13 < 20 as TRUE.

When you think about it, we’ve seen lots of expressions so far. Here are just a few of them:

sqrt(64)
heights
heights[1:3]
13 < 20

When we type any one of them into the console, it evaluates to a particular value. In the examples above, the value was always a vector.

Expressions like 13 < 20 that evaluate to a logical vector are often called Boolean expressions.¹

2.4.1 Boolean Operators

Let’s look further into Boolean expressions. Define the following two vectors:

a <- c(10, 13, 17)
b <- c(8, 15, 12)

Now let’s evaluate the expression a < b:

a < b

[1] FALSE  TRUE FALSE

The < operator, when applied to vectors, always works element-wise; that is, it is applied to corresponding elements of the vectors on either side of it. R’s evaluation of a < b involves evaluation of the following three expressions:

10 < 8 (evaluates to FALSE)
13 < 15(evaluates to TRUE)
17 < 12(evaluates to FALSE)

The result is a logical vector of length 3.

The < operator is an example of a Boolean operator in R. Table 2.1 shows the available Boolean operators.

Table 2.1: The Boolean Operators

The Boolean Operators
Operation	What It Means
<	less than
>	greater than
<=	less than or equal to
>=	greater than or equal to
==	equal to
&	and
\|	or
!	not

2.4.1.1 Inequalities

The “numerical-looking operators” (<, <=, >, >=) have their usual meanings when one is working with numerical vectors² When applied to character vectors they evaluate according to an alphabetical order. Try this:

The reasons for the evaluation above are as follows:

D comes before t in the alphabet;
lowercase t comes before uppercase T, according to R;
characters for numbers come before letter-characters, according to R.

2.4.1.2 Equality

The equality (==) operator indicates whether the expressions being compared evaluate to the same value. Note that it’s made with two equal-signs, not one! It’s all about evaluation to the same value, not strict identity. The following examples will help to clarify this (run the code):

(Note that the resulting logical vector inherits the names of a, the vector on the left.)

But a and b aren’t identical. We can see this because R has the function identical() to test for identity:

Corresponding elements of a and b have the same values, but the two vectors don’t have the same set of names, so they aren’t considered identical.

Here’s another way to see that “evaluating to the same value” is not the same as “identity”. Try this:

When TRUE (itself oftype logical) is being compared with something numerical (type integer or double) it is coerced into the numerical vector 1. (In the same situation FALSE would be coerced to 0.) But TRUE and 1 are not identical, as you can verify by running the code below:

2.4.1.3 And, Or, Not

We consider an “and” statement to be true when both of its component statements are true; otherwise it is counted as false. The & Boolean operator accords with our thinking (try this):

In logic and mathematics, an “or” statement is considered to be true when at least one of its component statements are true. (This is sometimes called the “inclusive” use of the term “or.”) R accords with this line of thinking (try this):

The final Boolean operator is !, which works like “not” (try these):

2.4.2 Vector Recycling

Consider the vector

vec <- c(2, 6, 1, 7, 3)

Look at what happens when we evaluate the expression:

vec > 4

[1] FALSE  TRUE FALSE  TRUE FALSE

At first blush this doesn’t make any sense: vec has length 5, whereas 4 is a vector of length 1. How can the two of them be compared?

They cannot, in fact, be compared. Instead the shorter of the two vectors—the 4—is recycled into the c(4,4,4,4,4) a vector of length five, which may then be compared element-wise with vec. Recycling is a great convenience as it allows us to express an idea clearly and concisely.

Recycling is always performed on the shorter of two vectors. Consider the example below:

vec2 <- 1:6
vec2 > c(3,1)

[1] FALSE  TRUE FALSE  TRUE  TRUE  TRUE

Here, c(3,1) was recycled into c(3,1,3,1,3,1) prior to being compared with vec2.

What happens if the length of the longer vector is not a multiple of the shorter one? We should look into this:

vec2 <- 1:7
vec2 > c(3, 8)

## longer object length is not a multiple of shorter object length
## [1] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE

We get a warning, but R tries to do the job for us anyway, recycling the shorter vector to c(3,8,3,8,3,8,3) and then performing the comparison.

By the way, if you don’t want to see the warning you can put the expression into the suppressWarnings() function:

suppressWarnings(vec2 > c(3, 8))

[1] FALSE FALSE FALSE FALSE  TRUE FALSE

2.4.3 Practice Exercises

Tip 2.15: Expressions with Boolean Operators

Problem
Solutions

In this exercise you’ll work with the following vectors:

person <- c("Dorothy", "Scarecrow", "Tin Man", "Lion", "Toto")
age <- c(12, 0.04, 15, 18, 6)
likesDogs <- c(TRUE, FALSE, TRUE, FALSE, TRUE)

Think of the vectors as having corresponding elements. Thus, there is a person named Dorothy who is 12 years old and likes dogs, a person named Tin Man who is 0.04 years old and doesn’t like dogs, etc.

Write a Boolean expression that is TRUE when a person is less than 14 years old and FALSE otherwise.
Write a Boolean expression that is TRUE when a person is between 10 and 15 years old (including 10 but not 15) and FALSE otherwise.
Write a Boolean expression that is TRUE when a person is more than 12 years old and likes dogs, and FALSE otherwise.
Write a Boolean expression that is TRUE when a person is more than 12 years old and does not like dogs, and FALSE otherwise.
Write a Boolean expression that is TRUE when a person is more than 12 years old and or likes dogs, and FALSE otherwise.
Write a Boolean expression that is TRUE when the person is Dorothy, and FALSE otherwise.
Write a Boolean expression that is TRUE when the person is Dorothy or Tin Man, and FALSE otherwise.
Write a Boolean expression that is TRUE when the person’s name comes after the letter “M” in the alphabet, and FALSE otherwise.
Write a Boolean expression that is FALSE when the person is Dorothy, and TRUE otherwise.

You can work in the code-field below:

A Boolean expression that is TRUE when a person is less than 14 years old and FALSE otherwise:

age < 14

[1]  TRUE  TRUE FALSE FALSE  TRUE

A Boolean expression that is TRUE when a person is between 10 and 15 years old (including 10 but not 15) and FALSE otherwise:

age >= 10 & age < 15

[1]  TRUE FALSE FALSE FALSE FALSE

A Boolean expression that is TRUE when a person is more than 12 years old and likes dogs, and FALSE otherwise:

age > 12 & likesDogs

[1] FALSE FALSE  TRUE FALSE FALSE

You could also do it like this:

age > 12 & likesDogs == TRUE

[1] FALSE FALSE  TRUE FALSE FALSE

But the second solution is needlessly roundabout, as likesDogs == TRUE evaluates to the very same logical vector as likesDogs.

A Boolean expression that is TRUE when a person is more than 12 years old and does not like dogs, and FALSE otherwise:

age > 12 & !likesDogs

[1] FALSE FALSE FALSE  TRUE FALSE

A Boolean expression that is TRUE when a person is more than 12 years old and or likes dogs, and FALSE otherwise:

age > 12 | likesDogs

[1]  TRUE FALSE  TRUE  TRUE  TRUE

Write a Boolean expression that is TRUE when the person is Dorothy, and FALSE otherwise.

person == "Dorothy"

[1]  TRUE FALSE FALSE FALSE FALSE

Write a Boolean expression that is TRUE when the person is Dorothy or Tin Man, and FALSE otherwise.

person == "Dorothy" | person == "Tin Man"

[1]  TRUE FALSE  TRUE FALSE FALSE

Write a Boolean expression that is TRUE when the person’s name comes after the letter “M” in the alphabet, and FALSE otherwise.

person > "M"

[1] FALSE  TRUE  TRUE FALSE  TRUE

Write a Boolean expression that is FALSE when the person is Dorothy, and TRUE otherwise.

person != "Dorothy"

[1] FALSE  TRUE  TRUE  TRUE  TRUE

You could also do it like this (though the expression is a bit harder to read):

!(person == "Dorothy")

[1] FALSE  TRUE  TRUE  TRUE  TRUE

2.5 Subsetting with Logical Vectors

The subsetting we have seen up to now involves specifying the indices of the elements we would like to select from the original vector. It is also possible to say, for each element, whether or not it is to be included in our selection. This is accomplished by means of logical vectors.

Recall our heights vector:

heights

Scarecrow    Tinman      Lion   Dorothy      Toto       Boq 
       73        70        69        60        NA        46

Let’s say that we want the heights of Scarecrow, Tinman and Dorothy. We can use a logical vector to do this:

wanted <- c(TRUE, TRUE, FALSE, TRUE, FALSE, FALSE)
heights[wanted]

Scarecrow    Tinman   Dorothy 
       73        70        60

The TRUE’s at indices 1, 2, and 4 in wanted inform R that we want the heights vector at indices 1, 2 and 4. The FALSE’s say: “don’t include this element!”

Subsetting can be used powerfully along with logical vectors and Boolean operators.

For example, in order to select those persons whose heights exceed a certain amount, we might say something like this:

#heights of some people:
people <- c(55, 64, 67, 70, 63, 72)
tall <- (people >= 70)
tall

[1] FALSE FALSE FALSE  TRUE FALSE  TRUE

people[tall]

[1] 70 72

As you can see, the tall vector specifies which elements we would like to select from the people vector.

We need not define the tall vector along the way. It is quite common to see something like the following:

people[people >= 70]

[1] 70 72

I like to pronounce the above as:

people, where people is at least 70

The word “where” in the above phrase corresponds to the subsetting operator.

Your subsetting logical vector need not have been constructed with the original vector in mind. Try the following example:

Here the selection is done from the age vector, using a logical vector that was constructed from height—another vector altogether. It concisely expresses the idea:

the ages of people whose height is less than 70

There is no limit to the complexity of selection. Try the following:

2.5.1 Counting

Logical subsetting provides a convenient way to count the elements of a vector that possess a given property. For example, think back to the vector people, which gave the heights of some people:

people <- c(55, 64, 67, 70, 63, 72)

In order find out how many people are less than 70 years old, we could say:

length(people[people < 70])

[1] 4

But there is an easier way, made possible by how R coerces logical vectors into numerical ones. Consider the following logical vector:

logical_vector <- c(TRUE, TRUE, FALSE, FALSE, FALSE, TRUE)

Now suppose we decide to add up those TRUEs and FALSEes. It seems like a crazy idea, but R actually gives us an answer:

sum(logical_vector)

[1] 3

What happened, here? Well, R needs numbers as input for the sum() function. when it doesn’t get numbers, it does the best it can to coerce its input into numbers. It so happens that R is programmed to coerce TRUE to the number 1 and FALSE to the number 0. So the command sum(logical_vector) really amount to:

sum(c(1, 1, 0, 0, 0, 1))

[1] 3

The upshot is that if you ask R to sum a logical vector, the result is just a count of how many TRUEs it contained.

Back to people. people < 70 is a Boolean expression, so its result is a logical vecotr:

people < 70

[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE

Summing this vector counts how many times it was true that the height was less than 70:

sum(people < 70)

[1] 4

That’s very convenient!

2.5.2 Which, Any, All

There are several functions on logical vectors that are worth keeping in your back pocket:

which()
any()
all()

2.5.2.1 `which()`

Applied to a logical vector, the which() function returns the indices of the vector that have the value TRUE. Try this:

Thus if we want to know the indices of heights where the heights are at least 65, then we write:

which(heights > 65)

Scarecrow    Tinman      Lion 
        1         2         3

(Recall that height was a named vector. The logical vector heights > 65 inherited these names and passed them on to the result of which().)

Note also that Toto’s NA height was ignored by which().

2.5.2.2 `any()` and `%in%`

Is anyone more than 71 inches tall? any() will tell us:

heights

Scarecrow    Tinman      Lion   Dorothy      Toto       Boq 
       73        70        69        60        NA        46

any(heights > 71)

[1] TRUE

Yes: the Scarecrow is more than 71 inches tall.

We can use any() along with the equality Boolean operator == to determine whether or not a given value appears a a given vector. Try this:

The above question occurs so frequently that R provides the %in% operator as a short-cut. Try these examples:

2.5.2.3 `all()`

Is everyone more than 71 inches tall?

all(heights > 71)

[1] FALSE

2.5.3 Practice Exercises

Tip 2.16: Subsetting with Logical Vectors

Problem
Solutions

Consider the following vectors:

person <- c(
  "Abe", "Bettina", "Candace", 
  "Devadatta", "Esmeralda", "Francis"
)
numberKids <- c(2, 1, 0, 2, 3, 4)
yearsEducation <- c(12, 16, 13, 14, 18, 15)
hasPets <- c(FALSE, FALSE, TRUE, TRUE, FALSE, TRUE)

Think of these vectors as providing information about six people. (For example, Abe has 2 children and 12 years of education and does not have pets.)

Write an expression that evaluates to the names of people who have more than 1 child.
Write an expression that evaluates to the numbers of children of people who have a pet.
Write an expression that evaluates to the years of education of people who have at least 13 years of education.
Write an expression that evaluates to the names of people who have more than one child and fewer than 15 years of education.
Write an expression that evaluates to the names of people who don’t have pets.
Write an expression that evaluates to the number of people who have pets.
Write an expression that evaluates to the number of people who don’t have pets.
Write an expression that evaluates to TRUE exactly when there is someone who has more 15 years of education and at least one child, but doesn’t have any pets.
Write an expression that evaluates to TRUE exactly when every person has more than 13 years of education.

10, Write an expression that evaluates to the indices of the person vector corresponding to the people who have more than 13 years of education.

You can work in the code-field below:

An expression that evaluates to the names of people who have more than 1 child:

person[numberKids > 1]

[1] "Abe"       "Devadatta" "Esmeralda" "Francis"

An expression that evaluates to the numbers of children of people who have a pet:

numberKids[hasPets]

[1] 0 2 4

An expression that evaluates to the years of education of people who have at least 13 years of education:

yearsEducation[yearsEducation >= 13]

[1] 16 13 14 18 15

An expression that evaluates to the names of people who have more than one child and fewer than 15 years of education:

person[numberKids > 1 & yearsEducation < 15]

[1] "Abe"       "Devadatta"

An expression that evaluates to the names of people who don’t have pets:

person[!hasPets]

[1] "Abe"       "Bettina"   "Esmeralda"

For an expression that evaluates to the number of people who have pets, we could use:

length(person[hasPets])

[1] 3

However, using the ideas of Section 2.5.1, we could get it more quickly as:

sum(hasPets)

[1] 3

For an expression that evaluates to the number of people who don’t have pets, we could use:

length(person[!hasPets])

[1] 3

But using the ideas of Section 2.5.1, we could get it more quickly as:

sum(!hasPets)

[1] 3

An expression that evaluates to TRUE exactly when there is someone who has more 15 years of education and at least one child, but doesn’t have any pets:

any(yearsEducation > 15 & numberKids >= 1 & !hasPets)

[1] TRUE

An expression that evaluates to TRUE exactly when every person has more than 13 years of edcuation:

all(yearsEducation > 13)

[1] FALSE

An expression that evaluates to the indices of the person vector corresponding to the people who have more than 13 years of education:

which(yearsEducation > 13)

[1] 2 4 5 6

2.6 Arithmetical Operations on Vectors

2.6.1 Familiar Arithmetic: Addition, Subtraction, etc.

R provides all of the familiar arithmetical operations. Table 2.2 shows the basic operators.

Table 2.2: Familiar arithmetical operations on vectors.

Operation	What It Means
x + y	addition
x - y	subtraction
x * y	multiplication
x / y	division
x^y	exponentiation (raise x to the power y)

The operators are applied element-wise to vectors. Try these examples:

As an illustration, the final result above is:

\[10^3, 15^4, 20^5.\] The “mod” operator %% can be quite useful. Here is an example: even numbers have a remainder of 0 after division by 2, whereas odd numbers have a remainder of 1. Hence we may use %% to quickly locate the even numbers in a vector, as follows. try this:

Recycling applies in vector arithmetic (as in most of R). Try this:

2.6.1.1 Vectorization

All of the above operations implement a “vector-in, vector-out” principle—referred to by R users as vectorization. Not only does vectorization permit us to express ideas concisely and in human-readable fashion, but the computations themselves tend to be performed very quickly.

The sqrt() function for taking square roots, which you met in the first Chapter, also implements vectorization. For example, if you need the square roots of all of the numbers in vec, then you can just write (try it):

2.6.2 Quotient (`%/%`) and Remainder (`%%`)

R provides two special operations that are related to division:

Basic arithmetical operations on vectors.
Operation	What It Means
x %/% y	integer division (quotient after dividing x by y)
x %% y	x mod y (remainder after dividing x by y)

Let’s review the ideas of quotient and remainder:

Suppose you divide 17 by 5.
The quotient is 3 …
- … because $3 \times 5 = 15 \leq 17$, but
- $4 \times 5 = 20 > 17$.
The remainder is 2 …
- … because $17 - 15 = 2$.

The %/% operator finds the quotient for you. Try this:

The %% operator in R finds the remainder. Try it:

Note: because remainders are associated with an area of advanced mathematics called modular arithmetic, people often pronounce the %% operator as “mod”. Thus, 17 %% 5 would be pronounced as “seventeen mod 5”.

Both of these operators implement vectorization, so, for example, the %% finds the remainders for all elements of any dividend-vector. Try this:

As an application, let’s find all the numbers in vec that are multiples of 6. It’s a two-step process:

First, find out whether or not each number has a remainder of 0 when divided by 6:

vec <- 5:20 
is_multiple_of_6 <- vec %% 6 == 0
is_multiple_of_6

 [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
[13] FALSE  TRUE FALSE FALSE

Second, select from vec with logical sub-setting:

vec[is_multiple_of_6]

[1]  6 12 18

Of course it is perfectly fine to do it all in one line:

vec[vec %% 6 == 0]

[1]  6 12 18

We would pronouce this as: “vec, where vec mod 6 is 0”.

Indeed, the mod-operator comes in handy quite often. Here are several more quick applications:

vec[vec %% 2 == 0] # just the even numbers in vec

[1]  6  8 10 12 14 16 18 20

vec[vec %% 2 == 0] # just the odd numbers in vec

[1]  6  8 10 12 14 16 18 20

vec[vec %% 3 == 1] # numbers one more than a multiple of 3

[1]  7 10 13 16 19

2.6.3 Summing (`sum()` and the Mean (`mean()`)

There are some functions on vectors that return only a vector of length 1. Among the examples we have met so far are:

length()
any()
all()

We have also met sum(), which returns the sum of all of the elements of a vector that it is given:

first_hundred_numbers <- 1:100
sum(first_hundred_numbers)

[1] 5050

In statistics we are often interested in the mean of a list of numbers. The mean is defined as:

\[\frac{\text{sum of the numbers}}{\text{how many numbers there are}}\] You can find the mean of a numerical vector as follows:

vec <- c(-3, 4, 17, 23, 51)
meanVec <- sum(vec)/length(vec)

The way we compute the mean in R looks a great deal like its mathematical definition.

You might be interested to know that there is a function in R dedicated to finding the mean. Unsurprisingly, it is called mean():

mean(vec)

[1] 18.4

2.6.4 Maxima and Minima

The max() function delivers the maximum value of the elements of a numerical vector. Try this:

The min() function delivers the minimum value of a numerical vector. Try this:

You can enter more than one vector into min() or max(): the function will combine the vectors and then do its job. Try this:

The pmax() function compares corresponding elements of each input-vector and produces a vector of the maximum values. Try this:

There is a pmin() function that computes pair-wise minima as well. Try this:

2.6.5 Practice Exercises

Tip 2.17: Vectors of Arithmetical Results

Problem
Solutions

Write a command that produces the squares of the first 10 whole numbers.
Write a command that produces the square roots of: the numbers from 1 to 100 that are two more than a multiple of 3. (Hint: Recall that the function sqrt() will take the square roots of the elements of any numerical vector).
Write a command that raises 2 to the second power, 3 to third power, 4 to the fourth power, … , up to 100 to the hundredth power.

You can work in the code-field below:

A command that produces the squares of the first 10 whole numbers:

(1:10)^2

 [1]   1   4   9  16  25  36  49  64  81 100

Note: The parentheses are important! If you leave them out, the squaring will only apply to the 10, so you’ll get just the numbers from 1 to 100:

1:10^2

  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
 [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
 [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
 [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
 [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
 [91]  91  92  93  94  95  96  97  98  99 100

That’s not what you wanted!

Here’s how to get a command that produces the square roots of: the numbers from 1 to 100 that are one more than a multiple of 3. First, note that you can get the numbers from 1 to 100 that are one more than a multiple of 3 with the seq() function:

seq(2, 100, by = 3)

 [1]  2  5  8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 56 59 62 65 68 71 74
[26] 77 80 83 86 89 92 95 98

Notice that we started with 2, because

\[2 = 0 \times 3 + 2\]

so it is two more than a multiple of 3. Going up by 3 gives us the subsequent numbers that are two more than a multiple of 3.

If we apply the sqrt() function to this vector, we’ll get the square roots that we wanted:

sqrt(seq(2, 100, by = 3))

 [1] 1.414214 2.236068 2.828427 3.316625 3.741657 4.123106 4.472136 4.795832
 [9] 5.099020 5.385165 5.656854 5.916080 6.164414 6.403124 6.633250 6.855655
[17] 7.071068 7.280110 7.483315 7.681146 7.874008 8.062258 8.246211 8.426150
[25] 8.602325 8.774964 8.944272 9.110434 9.273618 9.433981 9.591663 9.746794
[33] 9.899495

Alternative Solution: If we want to use the modulus-operator, we could make the numbers from 1 to 100 and then subset to get just the ones that are two more than a multiple of 3, like this:

(1:100)[1:100 %% 3 == 2]

 [1]  2  5  8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 56 59 62 65 68 71 74
[26] 77 80 83 86 89 92 95 98

So we can get what we need by taking the square roots:

sqrt((1:100)[1:100 %% 3 == 2])

 [1] 1.414214 2.236068 2.828427 3.316625 3.741657 4.123106 4.472136 4.795832
 [9] 5.099020 5.385165 5.656854 5.916080 6.164414 6.403124 6.633250 6.855655
[17] 7.071068 7.280110 7.483315 7.681146 7.874008 8.062258 8.246211 8.426150
[25] 8.602325 8.774964 8.944272 9.110434 9.273618 9.433981 9.591663 9.746794
[33] 9.899495

A command that raises 2 to the second power, 3 to third power, 4 to the fourth power, … , up to 100 to the hundredth power:

(1:100)^(1:100)

  [1]  1.000000e+00  4.000000e+00  2.700000e+01  2.560000e+02  3.125000e+03
  [6]  4.665600e+04  8.235430e+05  1.677722e+07  3.874205e+08  1.000000e+10
 [11]  2.853117e+11  8.916100e+12  3.028751e+14  1.111201e+16  4.378939e+17
 [16]  1.844674e+19  8.272403e+20  3.934641e+22  1.978420e+24  1.048576e+26
 [21]  5.842587e+27  3.414279e+29  2.088047e+31  1.333736e+33  8.881784e+34
 [26]  6.156120e+36  4.434265e+38  3.314552e+40  2.567686e+42  2.058911e+44
 [31]  1.706917e+46  1.461502e+48  1.291100e+50  1.175664e+52  1.102507e+54
 [36]  1.063874e+56  1.055513e+58  1.075912e+60  1.125951e+62  1.208926e+64
 [41]  1.330878e+66  1.501309e+68  1.734377e+70  2.050774e+72  2.480636e+74
 [46]  3.068035e+76  3.877924e+78  5.007021e+80  6.600972e+82  8.881784e+84
 [51]  1.219211e+87  1.706766e+89  2.435685e+91  3.542118e+93  5.247445e+95
 [56]  7.916432e+97 1.215813e+100 1.900306e+102 3.021821e+104 4.887368e+106
 [61] 8.037481e+108 1.343646e+111 2.282730e+113 3.940201e+115 6.908252e+117
 [66] 1.229985e+120 2.223370e+122 4.079492e+124 7.596040e+126 1.435036e+129
 [71] 2.750064e+131 5.344902e+133 1.053341e+136 2.104492e+138 4.261817e+140
 [76] 8.746474e+142 1.818804e+145 3.831590e+147 8.175987e+149 1.766847e+152
 [81] 3.866220e+154 8.565168e+156 1.920798e+159 4.359734e+161 1.001403e+164
 [86] 2.327377e+166 5.472364e+168 1.301593e+171 3.131198e+173 7.617735e+175
 [91] 1.873988e+178 4.661011e+180 1.171964e+183 2.978642e+185 7.651428e+187
 [96] 1.986270e+190 5.210246e+192 1.380878e+195 3.697296e+197 1.000000e+200

Tip 2.18: Subsetting with Arithmetic

Problem
Solutions

We’ll work with vectors from a previous set of practice exercises:

person <- c(
  "Abe", "Bettina", "Candace", 
  "Devadatta", "Esmeralda", "Francis"
)
numberKids <- c(2, 1, 0, 2, 3, 4)
yearsEducation <- c(12, 16, 13, 14, 18, 15)
hasPets <- c(FALSE, FALSE, TRUE, TRUE, FALSE, TRUE)

Using the sum() function and the vector hasPets, write a command that says how many people have pets.
Using the sum() function and the vector hasPets, write a command that says how many people do not have pets.
Using the vectors above, find the name of the person who has the most education.

You can work in the code-field below:

A command that says how many people have pets:

sum(hasPets)

[1] 3

A command that says how many people do not have pets:

sum(!hasPets)

[1] 3

The name of the person who has the most education:

person[yearsEducation == max(yearsEducation)]

[1] "Esmeralda"

2.7 Legal Variable Names

Using the assignment operator we have created quite a few variables by now, and we appear to have named them whatever we want. In fact there are very few limitations on the name of a variable. According to R’s own documentation:³

“A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number.”

This means:

Note

In R, a legal name:

CAN contain letters, both lower and uppercase: a through z and also A through Z;
CAN contain the the digits 0,1, through 9;
CAN contain the dot . and the underscore _.
HOWEVER, it must start with either a letter or the dot, AND
- if it starts with a dot then that dot cannot be immediately followed by a number.

This leaves a lot of room for creativity. All of the following names are possible for variables:

yellowBrickRoad
yellow_brick_road
yellow.brick.road
yell23
y2e3L45Lo3....rOAD
.yellow

The following, though, are not valid:

.2yellow (cannot start with dot and then number)
_yellow (cannot start with _)
5scones (cannot start with a number)

Most programmers try to devise names for variables that are descriptive in the sense that they suggest to a reader of the code the role that is played within it by the variable. In addition they try to stick to a consistent system for variable names that divide naturally into meaningful words.

One popular convention is known as CamelCase. In this convention each new word-like part of the variable names begins with a capital letter. (The initial letter, though, is often not capitalized.) Examples would be:

emeraldCity
isEven

Another popular convention—sometimes called “snake-case”—is to use lowercase and to separate words with underscores:

emerald_city
is_even

An older convention—one that was popular among some of the original developers of R—was to separate words with dots:

emerald.city
is.even

You see will often see this in R functions (examples oyu have met so far include as.numeric() as.character(). This convention is no longer recommended, however, because in programming languages other than R the dot is associated syntactically with the calling of a “method” on an “object.”⁴

2.7.1 Practice Exercises

Tip 2.19: What Names are Permitted?

Problem
Solutions

For each of the following names, say whether or not it is a valid name in R. If it’s not a valid name, explain why.

ThreeLittlePigs
3LittlePigs
LittlePigs3
Little-Pigs-3
Little_Pigs_3
three.little.pigs

ThreeLittlePigs is a valid name.
3LittlePigs is not a valid name: you cannot begin a name with a digit.
LittlePigs3 is a valid name.
Little-Pigs-3 is not a valid name: hyphens are not allowed in names.
Little_Pigs_3 is a valid name.
three.little.pigs is a valid name.

2.8 More in Depth

2.8.1 More Math Functions

Here are a few more useful math functions involving vectors.

2.8.1.1 Cumulative Sums

Consider the this vector:

sample_numbers <- c(4, 5, -3, 2, 0, 8)

If you sum the elements of this vector up to each of its successive indices, you get:

just the first element: 4;
$4+5=9$;
$4+5+(-3)=6$;
$4+5+(-3)+2=8$;
$4+5+(-3)+2+0=8$ again;
$4+5+(-3)+2+0+8=16$.

The function cumsum() will do the work for you:

cumsum(sample_numbers)

[1]  4  9  6  8  8 16

That is: for each $i$ from 1 to the length of sample_numbers, the $i$-th element of cumsum(sample_numbers) is the sum of the elements of sample_numbers from index 1 through index $i$.

2.8.1.2 Rounding

You can use the round() function to round off numbers to any desired number of decimal places.

roots <- sqrt(1:5)
roots # Too much information!

[1] 1.000000 1.414214 1.732051 2.000000 2.236068

round(roots, digits = 3) # nicer

[1] 1.000 1.414 1.732 2.000 2.236

2.8.1.3 Ceiling and Floor

The ceiling() function returns the least integer that is greater than or equal to the given number:

vec <- c(-2.9, -1.1, 0.2, 1.35, 3, 4.01)
ceiling(vec)

[1] -2 -1  1  2  3  5

The floor() function returns the greatest integer that is less than or equal to the given number:

floor(vec)

[1] -3 -2  0  1  3  4

Note that vectorization applies to all of these functions.

Read this section if you are want to learn as much about R as quickly as you can.

2.8.2 Infinity in R

R has a special number called Inf. It is bigger than any real number, even a very big one like &10^50$

10^50 < Inf

[1] TRUE

Accordingly, -Inf is less than any real number:

-Inf < 334

[1] TRUE

-Inf < -10^50

[1] TRUE

R has opinions about arithmetic with infinities. They make sense if you have done a couple of semesters of calculus:

mean(c(5, -1000, Inf))

[1] Inf

Inf + 1

[1] Inf

Inf - 1000

[1] Inf

Inf - Inf

[1] NaN

2.8.3 More About `NA`

2.8.3.1 `NA` in Subsetting

Think back to the heights of our Oz characters:

heights <- c(72, 70, 69, 58, NA, 45)
names(heights) <- c("Scarecrow", "Tinman", "Lion", "Dorothy", "Toto", "Boq")
heights

Scarecrow    Tinman      Lion   Dorothy      Toto       Boq 
       72        70        69        58        NA        45

You should be aware of the effect of NA-values on subsetting.

tall <- (heights > 65)
tall

Scarecrow    Tinman      Lion   Dorothy      Toto       Boq 
     TRUE      TRUE      TRUE     FALSE        NA     FALSE

Since Toto’s height was missing, R can’t say whether or not he was more than 65 inches tall. Hence it assigns NA to the Toto-element of the tall vector.

When we subset using this vector we get an odd result:

heights[tall]

Scarecrow    Tinman      Lion      <NA> 
       72        70        69        NA

Since R doesn’t know whether or not to select Toto, it records its indecision by including an NA in the result. That NA, however, is not the NA for Toto’s height in the vector heights, so it can’t inherit the “Toto” name. Since it has no name, R presents its name as <NA>.

If we try to count the number of tall persons, we get a misleading result:

length(heights[tall])

[1] 4

We would have preferred something like:

“Three, with another one undecided.”

Counting is one those situations in which we might wish to remove NA values at the start. If the vector is small we could remove them by hand, e.g.:

knownHeights <- heights[-5]  # remove Toto
tall <- (knownHeights > 65)
length(knownHeights[tall])

[1] 3

For longer vectors the above approach won’t be practical. Instead we may use the is.na() function.

is.na(heights)

Scarecrow    Tinman      Lion   Dorothy      Toto       Boq 
    FALSE     FALSE     FALSE     FALSE      TRUE     FALSE

Then we may select those elements that are not NA:

knownHeights <- heights[!is.na(heights)]
knownHeights

Scarecrow    Tinman      Lion   Dorothy       Boq 
       72        70        69        58        45

length(knownHeights[knownHeights > 65])

[1] 3

Is everyone more than 40 inches tall?

all(heights > 40)

[1] NA

Everyone with a known height is taller than 40 inches, but because Toto’s height is NA R can’t say whether all the heights are bigger than 40.

2.8.3.2 Contagious `NA`s in Arithmetic, and `na.rm`

When an arithmetic function produces one number out of many numbers, it is very susceptible to the presence of NA values.

For example, look at:

some_numbers <- c(3, 7, -2)
sum(some_numbers)

[1] 8

some_other_numbers <- c(3, 7, -2, NA)
sum(some_other_numbers)

[1] NA

mean(some_other_numbers)

[1] NA

This it is reasonable that the sum and the mean should return NA: after all, if you don’t know what one of the numbers is, how can you find the sum of the numbers?

Similarly, the max() and the min() functions yield NA when one of the elements is NA:

max(3, 7, -2, NA)

[1] NA

min(3, 7, -2, NA)

[1] NA

You could say that NA is “contagious” in arithmetic functions.

Sometimes your vector will contain NAs, but you want to do arithemtic on the rest of the elements—the numbers that are not NA. You can do this with the na,rm parameter, whch all of the arithmetic functions know about. By defualt this parameter is set to FALSE (meaning: don’t remove the NA values) but you can set it to TRUE (meaning: remove all the NA values and then get on with the requested arithmetic). Here are some examples:

max(3, 7, -2, NA, na.rm =TRUE)

[1] 7

sum(some_other_numbers, na.rm = TRUE)

[1] 8

2.8.3.3 When Math Goes Wrong: `NaN`

The results of some arithmetical operations sometimes are not defined. (Examples: you can’t divide by 0; you can’t take the square root of a negative number.) R reports the results of such operations as NaN—“not a number.” R also issues a warning:

sqrt(c(-4, 2, 4))

Warning in sqrt(c(-4, 2, 4)): NaNs produced

[1]      NaN 1.414214 2.000000

Keep in mind, though, that the result is a perfectly good vector as far as R is concerned. After the warning R will permit you to use it in further computations:

vec<- sqrt(c(-4, 2, 4))

Warning in sqrt(c(-4, 2, 4)): NaNs produced

vec + 3

[1]      NaN 4.414214 5.000000

2.8.4 Syntax

In the process of learning about R, you have been unconsciously imbibing some of its syntax. The syntax of a computer-programming is the complete set of rules that determine what combinations of symbols are considered to make a well-formed program in the language—something that R can interpret and attempt to execute.

2.8.5 Syntax Errors vs. Run-time Errors vs. Semantic Errors

For the most part you will learn the syntax informally. By now, for example, you have probably realized that when you call a function you have to supply a closing parenthesis to match the open parenthesis. Thus the following is completely fine:

sum(1:5)

[1] 15

On the other hand if you were to type sum(1:5 alone on a single line in a R script, R Studio’s code-checker would show a red warning-circle at that line. Hovering over the circle you would see the message:

unmatched opening bracket '('

If you were to attempt to run the command sum(1:5 from the script you would get the following error message:

## Error: Incomplete expression: sum(1:5

Such an error is called a syntax error.⁵ The R Studio IDE can detect most—but not all—syntax errors.

Syntax errors in computer programming are similar to grammatical errors in ordinary language, such as:

“Mice is scary.” (Number of the subject does not match the number of the verb.)
“Mice are.” (Incomplete expression.)

A run-time error is an error that occurs when the syntax is correct but R is unable to finish the execution of your code for some other reason. The following code, for example, is perfectly fine from a syntactical point of view:

sum("hello")

When run, however, it produces an error:

## Error in sum("hello") : invalid 'type' (character) of argument

Here is another example:

sum(emeraldCity)

Unless for some reason you have defined the variable emeraldCity, an attempt to run the above command will produce the following run-time error:

## Error: object 'emeraldCity' not found

Many run-time errors in computer programming resemble errors in ordinary language where the sentence is grammatically correct by does not mean anything, as in:

“Beelbubs are juicy.” (What’s a “beelbub”?)

There is a third type of error, known in the world of programming as a semantic error. The term “semantics” refers to the meaning of things. Computer code is said to contain a semantic error when it is syntactically correct and can be executed, but does not deliver the results one knows to expect.

As an example, suppose you have defined, at some point, two variables:

emeraldCity <- 15
emeraldcity <- 4

Suppose now that—wanting R to compute $15^2$—you run the following code:

emeraldcity^2

[1] 16

You don’t get the results you wanted, because you accidentally asked for the square of the wrong number.

Semantic errors are usually the most difficult errors for programmers to detect and repair.

2.8.6 More About the Assignment Operator

We have been using the assignment operator <- to assign values to variables. You should be aware that there is another assignment operator that works the other way around:

4 -> emeraldCity
emeraldCity

[1] 4

Most people don’t use it.

A popular alternative to <- as an assignment operator is the equals sign =:

emeraldCity = 5
emeraldCity

[1] 5

I myself prefer to stay away from it, as it can be confused with other uses of =, such as the setting of values to parameters in functions:

rep("Dorothy", times = 3)

[1] "Dorothy" "Dorothy" "Dorothy"

When you have to assign the same value to several values, R allows you to abbreviate a bit. Consider the following code:

a <- b <- c <- 5

The above code has the same effect as:

a <- 5
b <- 5
c <- 5

2.8.7 Multiple Expressions

R allows you to write more than one expression on a single line, as long as you separate the expressions with semicolons:

a <- b <- c <- 5
a; b; c; 2+2; sum(1:5)

[1] 5

[1] 5

[1] 5

[1] 4

[1] 15

2.8.8 More on Variable Names; Reserved Words

There is one further restriction on variable-names that we have not yet mentioned: you are not allowed to use any of R’s reserved words. These are:

if, else, while, repeat, function, for, in, next, break,TRUE, FALSE, NULL, inf, NaN, NA, NA_integer, NA_real, NA_complex, NA_character

You need not memorize the above list: You’ll gradually learn most of it, and words you don’t learn are words that you are unlikely to ever choose as a variable-name on your own. Besides, reserved words show in in blue in the R Studio editor, and if you manage to use one anyway then R will stop you outright with a clear error message:

break <- 5

## Error in break <- 5 : invalid (NULL) left side of assignment

Notice that TRUE and FALSE are reserved words. R actually allows abbreviations: T for TRUE and F for FALSE. You can see this from the following expression:

c(T, F, F, T)

[1]  TRUE FALSE FALSE  TRUE

However, T and F are not reserved words: instead they are just variable-names, and R has bound them (in its base package) to TRUE and FALSE respectively. since they are just variable-names, They could be bound to other values.

This can lead to problems in code, if someone chooses to bind T or F to some value and you are not aware of thier choice.

For example, suppose that have two lines of code like this:

T <- 0
F <- 1

Later on, suppose you create what you think is a logical vector:

myVector <- c(T, F, F, T)

But it’s not logical:

typeof(myVector)

[1] "double"

That’s because T ad F have been bound to numerical values. If you coerce myLogical to a logical vector, you get the exact opposite of what you would have expected:

as.logical(myVector)

[1] FALSE  TRUE  TRUE FALSE

The moral of the story is:

Warning

Don’t use T for TRUE or F for FALSE, even though R allows it.

One final remark: variables together with reserved words constitute the part of the R language called identifiers.

2.8.9 Practice Exercises

Suppose that you begin a new R session and that you run the following code:

person <- c(
  "Abe", "Bettina", "Candace", 
  "Devadatta", "Esmeralda"
)
numberKids <- c(2, 1, 0, 2, 3)
yearsEducation <- c(12, 16, 13, 14, 18)
hasPets <- c(FALSE, FALSE, TRUE, TRUE, FALSE, TRUE)

What sort of error (syntax, runtime or semantic) is produced by the next piece of code, which is intended to produce the names of the people with more than 15 years of education? Why?

person(yearsEducation > 15]

What sort of error (syntax, runtime or semantic) is produced by the next piece of code, which is intended to produce the names of the people who don’t have pets? Why?

person(!haspets]

What sort of error (syntax, runtime or semantic) is produced by the next piece of code, which is intended to find out how many people have pets? Why?

length(hasPets)

2.8.10 Solutions to the Practice Exercises

This will result in a syntax error. You need brackets to select and you’ve got a parenthesis n the left. The correct syntax would be:

person[yearsEducation > 15]

This will result in a runtime error. The variable haspets is not defined, so R will issue a “can’t find” error when the code is executed. Probably you meant:

person[!hasPets]

This will result in a semantic error. You’ll get 5 (the number of elements in the vector hasPets) What you wnat can be accomplished by either one of the following:

sum(!hasPets)              # nice and snappy!
length(hasPets[!hasPets])  # kinda awkward, but it works

2.8.11 Optional Glossary (For This Section Only)

Reserved Words: Identifiers that are set aside by R for specific programming purposes. They cannot be used as names of variables.
Syntax: The complete set of rules for a computer language that determine what combinations of symbols are considered to make a well-formed program in the language.
Syntax Error: A sequence of symbols that contains a violation of one of the rules of syntax. R is unable to interpret and attempt to execute code that contains a syntax error.
Run-time Error: An error that occurs when the computer language’s interpreter attempts to execute code but is unable to do so. A typical cause of a run-time error is the situation when the code calls for the evaluation of a name that has not been bound to an object.
Semantic Error: An error in code that is syntactically correct and that can be executed by the computer but which produces unexpected results.

The Main Ideas of This Chapter

We meet four types of atomic vectors in this course: logical, integer, character and double.
- logical: TRUEs and FALSEs;
- integer: put an L after a whole number, like 3L or -2L;
- double: these can have decimal points, like 3.1415, and can also be expressed with scientific notation, like 5.1e06 (which stands for $5 \times 10^6$, or 5100000);
- character: these are strings "Dorothy" and "3.14".
The c() function concatenates (combines) vectors together.
Everything in R is a vector, even things that look like lone values: 3.2 or TRUE or "Dorothy".
letters gives the lowercase letters of the English alphabet, and LETTERS gives you the uppercase letters.
You can use rep() and seq() and the colon operator : to construct various patterned vectors. Know the arguments for the rep() and seq() functions, and know how to use them in combination to make complex patterned vectors.
You can subset vectors with the [ operator, e.g. letters[1:5] gives you the first five lowercase letters.
Sub-setting with logical vectors: Boolean operators help you make Boolean expressions that evaluate to logical vectors. These can be used in sub-setting, as in letters[letters <= "r].
Review which(), any() and all(), but especially keep in mind the “in”-operator %in%, as in 3 %in% 5:400 (which is FALSE by the way).
R does the basic arithmetic you would expect. Especially keep in mind the mod-operator %% that finds remainders.
Know the rules for what makes a legal variable-name in R.

Links to Class Slides

Quarto Presentations that I sometimes use in class:

Glossary for the Chapter

Vector Type: Any one of the six basic forms the elements in an atomic vector can take. The four types we will encounter the most are: double, integer, character and logical.
Coercion: The process of changing a vector from one type to another. Sometimes the process takes place automatically, as a convenience to the programmer.
Sub-setting: The operation of selecting one or more elements from a vector.
Recycling: An automatic process by which R, when given two vectors, repeats elements of the shorter vector until it is as long as the longer vector. Recycling enables the two resulting vectors to be combined element-wise in operations.
Vectorization: R’s ability to operate on each element of a vector, producing a new vector of the same length. Vectorized operations can be expressed concisely and performed very quickly.

Exercises

Exercise 1

Determine the type of each of the following vectors:

c(3.2, 2L, 4.7, TRUE)
c(as.integer(3.2), 2L, 5L, TRUE)
c(as.integer(3.2), 2L, "5L", TRUE)

Exercise 2

Using a combination of c(), rep() and seq() and other operations, find concise one-line programs to produce each of the following vectors:

all numbers from 4 to 307 that are one more than a multiple of 3;
the numbers 0.01, 0.02, 0.03, …, 0.98, 0.99.
twelve 2’s, followed by twelve 4’s followed by twelve 6’s, …, followed by twelve 10’s, finishing with twelve 12’s. (Hint: Review Section Section 2.2.2 and also Practice 2.8.)
one 1, followed by two 2’s, followed by three 3’s, …, followed by nine 9’s, finishing with ten 10’s. (Hint: Again review Section Section 2.2.2 and also Practice 2.8.)

Exercise 3

Using a combination of c(), rep() and seq() and other operations, find concise one-line programs to produce each of the following vectors:

the numbers 15, 20, 25, …, 145, 150.
the numbers 1.1, 1.2, 1.3, …, 9.8, 9.9, 10.0.
ten A’s followed by ten B’s, …, followed by ten Y’s and finishing with ten Z’s. (Hint: the special vector LETTERS will be useful.)
one a, followed by two b’s, followed by three c’s, …, followed by twenty-five y’s, finishing with twenty-six z’s. (Hint: the special vector letters will be useful, and you should review Practice 2.8.)

For Exercises 4, 5, 6, and 7 it is a good idea to first review Section 2.5.

Exercise 4

The following three vectors gives the names, heights and ages of five people, and also say whether or not each person likes Toto:

person <- c("Akash", "Bee", "Celia", "Devadatta", "Enid")
age <- c(23, 21, 22, 25, 63)
height <- c(68, 67, 71, 70, 69)
likesToto <- c(TRUE, TRUE, FALSE, FALSE, TRUE)

Use sub-setting with logical vectors to produce vectors of:

the names of all people over the age of 22;
the names of all people younger than 24 who are also more than 67 inches tall;
the names of all people who either don’t like Toto or who are over the age of 30;
the number of people who are over the age of 22.

Exercises 5, 6 and 7 below use the vectors defined in Exercise 4.

Exercise 5

Use sub-setting with logical vectors to produce vectors of:

the names of all people who are less than 70 inches tall;
the names of all people who are between 20 and 30 years of age (not including 20 or 30);
the names of all people who either like Toto or who are under the age of 50;
the number of people who are more than 69 inches tall.

Exercise 6

Use the sum() function along with logical vectors to find:

the number of people younger than 24 who are also more than 67 inches tall;
the number of people who either don’t like Toto or who are over the age of 30.

Hint: Review Section 2.5.1.

Exercise 7

Read the previous problem, and then use sum() along with logical vectors to find:

the number of people between 65 and 70 inches tall (including 65 and 70);
the number of people who either don’t like Toto or who are under the age of 25.

Hint: Review Section 2.5.1.

So-called after George Boole, a nineteenth century British logician.↩︎
A vector is said to be numerical if it is of type integer or double.↩︎
See help(make.names).↩︎
We will look briefly at R’s object-oriented capabilities in Chapter 15.↩︎
R is a bit more forgiving if you type sum(1:5 directly into the console and press Enter. Instead of throwing an error, R shows a + prompt, hoping for further input that would correctly complete the command. If you are ever in the situation where you do not know how to complete the command, you may simply press the Escape key (upper left-hand corner of your keyboard): R will then abort the command and return to a regular prompt.↩︎

2.1 What is a Vector?

2.1.1 Types of Atomic Vectors

2.1.2 Coercion

2.1.3 Combining Vectors

2.1.4 NA Values

2.1.5 “Everything in R is a Vector”

2.1.6 Named Vectors

2.1.7 Special Character Vectors

2.1.8 Length of Vectors

2.1.9 Practice Exercises

2.2 Constructing Patterned Vectors

2.2.1 Sequencing

2.2.2 Repeating

2.2.3 Practice Exercises

2.3 Subsetting Vectors

2.3.1 Practice Exercises

2.4 More on Logical Vectors

2.4.1 Boolean Operators

2.4.1.1 Inequalities

2.4.1.2 Equality

2.4.1.3 And, Or, Not

2.4.2 Vector Recycling

2.4.3 Practice Exercises

2.5 Subsetting with Logical Vectors

2.5.1 Counting

2.5.2 Which, Any, All

2.5.2.1 which()

2.5.2.2 any() and %in%

2.5.2.3 all()

2.5.3 Practice Exercises

2.6 Arithmetical Operations on Vectors

2.6.1 Familiar Arithmetic: Addition, Subtraction, etc.

2.6.1.1 Vectorization

2.6.2 Quotient (%/%) and Remainder (%%)

2.6.3 Summing (sum() and the Mean (mean())

2.6.4 Maxima and Minima

2.6.5 Practice Exercises

2.7 Legal Variable Names

2.7.1 Practice Exercises

2.8 More in Depth

2.8.1 More Math Functions

2.8.1.1 Cumulative Sums

2.8.1.2 Rounding

2.8.1.3 Ceiling and Floor

2.8.2 Infinity in R

2.8.3 More About NA

2.8.3.1 NA in Subsetting

2.8.3.2 Contagious NAs in Arithmetic, and na.rm

2.8.3.3 When Math Goes Wrong: NaN

2.8.4 Syntax

2.8.5 Syntax Errors vs. Run-time Errors vs. Semantic Errors

2.8.6 More About the Assignment Operator

2.8.7 Multiple Expressions

2.8.8 More on Variable Names; Reserved Words

2.8.9 Practice Exercises

2.8.10 Solutions to the Practice Exercises

2.8.11 Optional Glossary (For This Section Only)

The Main Ideas of This Chapter

Links to Class Slides

Glossary for the Chapter

Exercises

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Exercise 6

Exercise 7

2.5.2.1 `which()`

2.5.2.2 `any()` and `%in%`

2.5.2.3 `all()`

2.6.2 Quotient (`%/%`) and Remainder (`%%`)

2.6.3 Summing (`sum()` and the Mean (`mean()`)

2.8.3 More About `NA`

2.8.3.1 `NA` in Subsetting

2.8.3.2 Contagious `NA`s in Arithmetic, and `na.rm`

2.8.3.3 When Math Goes Wrong: `NaN`