2 Vectors
This Chapter gets you started officially with R. While the theme is vectors, the most important data structure in R, we’ll learn also about variables and variable names, vector types, reserved words, assignment and many of R’s basic operators.
2.1 What is a Vector?
If you have heard of vectors before in mathematics, you might think of a vector as something that has a magnitude and a direction, and that can be represented by a sequence of numbers. In its notion of a vector, R keeps the idea of a sequence but discards magnitude and direction. The notion of “numbers” isn’t even necessary.
For R, a vector is simply a sequence of elements. There are two general sort of vectors:
 atomic vectors that come in one of six forms called vector types;
 nonatomic vectors, called lists, whose elements can be any sort of Robject at all.
For now we’ll just study atomic vectors. Let’s make a few vectors, as examples.
We can make a vector of numbers using the c()
function:
numVec < c(23.2, 45, 631, 273, 0, 48.371, 100000,
85, 92, 236, 8546, 98774, 0, 0, 1, 3)
numVec
## [1] 23.200 45.000 631.000 273.000 0.000 48.371 100000.000
## [8] 85.000 92.000 236.000 8546.000 98774.000 0.000 0.000
## [15] 1.000 3.000
You can think of c
as standing for “combine.” c()
takes its arguments, all of which are separated by commas, and combines them to make a vector.
If you closely examine the above output, you’ll notice that R printed out all of the numerical values in the vector to three decimal places, which happened to be the largest number of decimal places we assigned to any of the numbers that made up numVec
. You’ll also notice the numbers in brackets at the beginning of the lines. Each number represents the position within the vector occupied by the first element of the vector that is printed on the line. The position of an element in a vector is called its index. Reporting the indices of leading elements helps you locate particular elements in the output.
2.1.1 Types of Atomic Vectors
The numbers in numVec
are what programmers call doubleprecision numbers. You can verify this for yourself with the typeof()
function:
typeof(numVec)
## [1] "double"
The typeof()
function returns the type of any object in R. As far as vectors are concerned, there are six possible types, of which we will deal with only four:
double
integer
character
logical
Let’s look at examples of the other types. Here is a vector of type integer
:
intVec < c(3L, 17L, 22L, 45L)
intVec
## [1] 3 17 22 45
The L
after each number signifies to R that the number should be stored in memory as an integer, rather than in doubleprecision format. Officially, the type is integer
:
typeof(intVec)
## [1] "integer"
You should know that if you left off one or more of the L
’s, then R would create a vector of type double
:
## [1] "double"
We won’t work much with integertype vectors, but you’ll see them out in the wild.
We can also make vectors out of pieces of text called strings: these are called character vectors. As noted in the previous chapter, we use quotes to delimit strings:
strVec < c("Brains", "are", "not", "the", "best",
"things", "in", "the", "world", "93.2")
strVec
## [1] "Brains" "are" "not" "the" "best" "things" "in" "the" "world"
## [10] "93.2"
typeof(strVec)
## [1] "character"
Notice that "93.2"
makes a string, not a number.
The last type of vectors to consider are the logical
vectors. Here is an example:
logVec < c(TRUE, FALSE, T, T, F, F, FALSE)
logVec
## [1] TRUE FALSE TRUE TRUE FALSE FALSE FALSE
In order to represent a logical value you can use:

TRUE
orT
to represent truth; 
FALSE
orF
to represent falsity.
You can’t represent truth or falsity any other way. If you try anything else—like the following—you get an error:
badVec < c(TRUE, false)
## Error: object 'false' not found
Note: Although R allows T
to be interpreted as TRUE
and F
as FALSE
, it can be dangerous to use them in some circumstances. Best to get into the habit of always using TRUE
and FALSE
, rather than the permitted abbreviations.
2.1.2 Coercion
What would happen if you tried to represent falsity with the string "false"
?
newVector < c(TRUE, "false")
newVector
## [1] "TRUE" "false"
newVector
is not a logical vector. Check it out:
typeof(newVector)
## [1] "character"
In order to understand what just happened here, you must recall that all of the elements of an atomic vector have to be of the same type. If the c()
function is presented with values of different types, then R follows a set of internal rules to coerce some of the values to a new type in such a way that all resulting values are of the same type. You don’t need to know all of the coercion rules, but it’s worth noting that

character
beatsdouble
,  which in turn beats
integer
,  which in in turn beats
logical
.
The following examples show this:
## [1] "character"
## [1] "double"
## [1] "integer"
Automatic coercion can be convenient in some circumstances, but in others it can give unexpected results. It’s best to keep track of what types you are dealing with and to exercise caution when combining values to make new vectors.
You can also coerce vectors “manually” with the functions:
Here are some examples:
numVec < c(3, 2.5, 7.32, 0)
as.character(numVec)
## [1] "3" "2.5" "7.32" "0"
as.integer(numVec)
## [1] 3 2 7 0
as.logical(numVec)
## [1] TRUE TRUE TRUE FALSE
Note that in coercion from numerical to logical, the number 0 becomes FALSE
and all nonzero numbers become TRUE
.
2.1.3 Combining Vectors
You can combine vectors you have already created to make new, bigger ones:
numVec1 < c(5, 3, 10)
numVec2 < c(1, 2, 3, 4, 5, 6)
numCombined < c(numVec1, numVec2)
numCombined
## [1] 5 3 10 1 2 3 4 5 6
You can see here that vectors are different from sets: they are allowed to repeat the same value in different indices, as we see in the case of the 3’s above.
2.1.4 NA Values
Consider the following vector, which we may think of as recording the heights of people, in inches:
heights < c(72, 70, 69, 58, NA, 45)
The NA
in the fifth position of the vector is a special value that may be considered to mean “Not Assigned.” It’s R’s way of letting us indicate that a value was not recorded or has gone missing for some reason.
2.1.5 “Everything in R is a Vector”
Some folks say that everything in R is a vector. That’s a bit of an exaggeration but it’s remarkably close to the truth.
And yet it seems implausible. What about the elements of an atomic vector, for instance? A single element doesn’t look at all like a vector: it’s a value, not a sequence of values.
Or so we might think. But really, in R there are no “single values” that can exist by themselves. Consider, for instance, what we think of as the number 17:
17
## [1] 17
See the [1]
in front, in the output above? It indicates that the line begins with the first element of a vector. So 17 doesn’t exist on its own: it exists a vector of type double
—a vector of length 1.
Even NA
is, all along, a vector of length 1
NA
## [1] NA
It is of type logical
:
typeof(NA)
## [1] "logical"
Note that even the type of NA
evaluates, in R, to a vector: a character vector of length 1 whose only element is the string “logical!”
2.1.6 Named Vectors
The elements of a vector can have names, if we like:
ages < c(Bettina = 32, Chris = 64, Ramesh = 101)
ages
## Bettina Chris Ramesh
## 32 64 101
Having names doesn’t keep the vector from being a vector of type double
: it has to be double
because its elements are double
.
typeof(ages)
## [1] "double"
We can names the elements of a vector when we create it with c()
, or we can name them later on. One way to do this is with the names()
function:
## Scarecrow Tinman Lion Dorothy Toto Boq
## 72 70 69 58 NA 45
2.1.7 Special Character Vectors
R comes with two handy, predefined character vectors:
letters
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u"
## [22] "v" "w" "x" "y" "z"
LETTERS
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U"
## [22] "V" "W" "X" "Y" "Z"
We will make use of them from time to time.
2.1.8 Length of Vectors
The length()
function tells us how many elements a vector has:
length(heights)
## [1] 6
2.1.9 Practice Exercises

Consider the following vector:
upperLower < c(LETTERS, letters)
What should the length of
upperLower
be? Check you answer using thelength()
function. True or False:
c("a", 2, TRUE)
yields a vector of length three consisting of the string"a"
, the number 2 and the logical valueTRUE
.
The function
as.numeric()
tries to coerce its input into numbers. How well can it pick out the “numbers” in strings. Try the following calls. When didas.numeric()
find the numbers that was probably intended?as.numeric("3.214") as.numeric("3L") as.numeric("fifty") as.numeric("10 + 3") as.numeric("3.25e3") # scientific notation: 3.25 times 10^(2) as.numeric("31,245")
2.1.10 Solutions to Practice Exercises

There are 26 letters, so the length of
upperlower
should be \(2 \times 26 = 52\). Let’s check:length(upperLower)
## [1] 52

False! The resulting vector will be atomic—all of its elements will be the same data type. The nonstrings will be coerced to strings, yielding:
c("a", 2, TRUE)
## [1] "a" "2" "TRUE"
as.numeric()
isn’t very smart: it picked out the number in"3.214"
and3.25e3
, but in the other cases it returnedNA
.
2.2 Constructing Patterned Vectors
Quite often we need to make lengthy vectors that follow simple patterns. R has a few functions to assist us in these tasks.
2.2.1 Sequencing
Consider the seq()
function:
seq(from = 5, to = 15, by = 1)
## [1] 5 6 7 8 9 10 11 12 13 14 15
The default value of the parameter by
is 1, so we could get the same thing with:
seq(from = 5, to = 15)
## [1] 5 6 7 8 9 10 11 12 13 14 15
Further reduction in typing may be achieved as long as we remember the order in which R expects the parameters (from
before to
, then by
if supplied):
seq(5, 15)
## [1] 5 6 7 8 9 10 11 12 13 14 15
Some more complex examples:
seq(3, 15, 2)
## [1] 3 5 7 9 11 13 15
seq(0, 1, 0.1)
## [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
R will go up to the to
value, but not past it:
seq(3, 16, 2)
## [1] 3 5 7 9 11 13 15
Negative steps are fine:
seq(5, 4, 1)
## [1] 5 4 3 2 1 0 1 2 3 4
The colon operator :
is a convenient abbreviation for seq
:
1:5 # 1 is from, 5 is to
## [1] 1 2 3 4 5
If the from
number is greater than the to
number the step for the colon operator is 1:
5:1
## [1] 5 4 3 2 1
2.2.2 Repeating
With rep()
we may repeat a given vector as many times as we like:
rep(3, times = 5)
## [1] 3 3 3 3 3
We can apply rep()
to a vector of length greater than 1:
## [1] 7 3 4 7 3 4 7 3 4
rep()
applies perfectly well to charactervectors:
rep("Toto", 4)
## [1] "Toto" "Toto" "Toto" "Toto"
rep()
also takes an each
parameter that determines how many times each element of the given vector will be repeated before the times
parameter is applied. This is best illustrated with an example:
## [1] 7 7 3 3 4 4 7 7 3 3 4 4 7 7 3 3 4 4
If we combine seq()
and rep()
we can create fairly complex patterns concisely:
## [1] 5 5 3 3 1 1 1 1 3 3 5 5 3 3 1 1 1 1 3 3
In order to create fifty 10’s followed by fifty 30’s followed by fifty 50’s I would write:
2.2.3 Practice Exercises

Use
rep()
to make the following vector:## [1] "Kansas" "Kansas" "Kansas" "Kansas" "Kansas"

Use
rep()
to make the following vector:## [1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE

Use
seq()
to make the following vector:## [1] 5 8 11 14 17 20 23 26
Use
seq()
to make all of the multiples of 4, beginning with 8 and going down to 32.Use the colon operator to make all of the whole numbers from 10 to 20.
Use the colon operator to make all of the whole numbers from 10 to 30.
You have a vector named
myVec
. Use the colon operator and thelength()
function to make all of the whole numbers from 1 to the length ofmyVec
.
Use
rep()
andseq()
together to make the following vector:## [1] 2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10

Use
rep()
andseq()
together to make the following vector:## [1] 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10

Read the Help for
rep()
?rep
It tells you that the first argument of
rep()
is the vector that you want to repeat, and that it’s calledx
. It goes on to say thattimes
is:“an integervalued vector giving the (nonnegative) number of times to repeat each element if of length
length(x)
, or to repeat the whole vector if of length 1.”Use this information to describe in words what will be the output of:
2.2.4 Solutions to Practice Exercises

Here’s how:
rep("Kansas", times = 5)

Here’s how:

Here’s how:
seq(5, 26, by = 3)

Here’s how:
seq(8, 32, by = 4)

Here’s how:
10:20

Here’s how:
10:30

All you need is this:
1:length(myVec)

Here’s how:

Here’s how:

You’ll get one 10, two 20s, three 30s, …, all the way up to ten 100s.
## [1] 10 20 20 30 30 30 40 40 40 40 50 50 50 50 50 60 60 60 60 60 60 ## [22] 70 70 70 70 70 70 70 80 80 80 80 80 80 80 80 90 90 90 90 90 90 ## [43] 90 90 90 100 100 100 100 100 100 100 100 100 100
2.3 Subsetting Vectors
Quite often we need to select one or more elements from a vector. The subsetting operator [
allows us to do this.
Recall the vector heights
:
heights
## Scarecrow Tinman Lion Dorothy Toto Boq
## 72 70 69 58 NA 45
If we want the fourth element, we ask for it with the subsetting operator like this:
heights[4]
## Dorothy
## 58
If we want two or more elements, then we specify their indices in a vector. Thus, to get the first and fifth elements, we might do this:
desired < c(1,5)
heights[desired]
## Scarecrow Toto
## 72 NA
We could also ask for them directly:
heights[c(1,5)]
## Scarecrow Toto
## 72 NA
Negative numbers are significant in subsetting:
heights[2] #select all but second element
## Scarecrow Lion Dorothy Toto Boq
## 72 69 58 NA 45
heights[c(1,3)] # all but first and third
## Tinman Dorothy Toto Boq
## 70 58 NA 45
If you specify a nonexistent index, you get NA
, the reasonable result:
heights[7]
## <NA>
## NA
Patterned vectors are quite useful for subsetting. If you want the first three elements of heights
, you don’t have to type heights[c(1,2,3)]
. Instead you can just say:
heights[1:3]
## Scarecrow Tinman Lion
## 72 70 69
The following gives the same as heights
:
heights[1:length(heights)]
## Scarecrow Tinman Lion Dorothy Toto Boq
## 72 70 69 58 NA 45
If you desire to quickly provide names for a vector, subsetting can help:
## A B C D E F
## 23 14 82 33 33 45
If a vector has names we can refer to its elements using the subsetting operator and those names:
heights["Tinman"]
## Tinman
## 70
heights[c("Scarecrow", "Boq")]
## Scarecrow Boq
## 72 45
Finally, we can use subsetting to modify parts of a vector. For example, Dorothy’s height is reported as:
heights["Dorothy"]
## Dorothy
## 58
If Dorothy grows two inches, then we can modify her height as follows:
heights["Dorothy"] < 60
We can replace more than one element, of course. Thus:
The subset of indices may be as complex as you like:
## [1] 3 100 5 200 7 300
In the above example, seq(2,6,2)
identified 2, 4 and 6 as the indices of elements of vec
that were to be replaced by the corresponding elements of c(100, 200, 300)
.
We can even use subsetting to rearrange the elements of a vector. Consider the example below:
## [1] "Boq" "Glinda" "Oz" "Toto"
2.3.1 Practice Exercises
We’ll work with the following vector:
practiceVec < c(4, 3, 7, 10, 5, 3, 8)
Select the fifth element of
practiceVec
.Select the third and sixth elements of
practiceVec
.Select the first, second, third and fourth elements of
practiceVec
.How would you select the last element of the vector
mysteryVec
if you did not know how many elements it had?Select all but the fourth element of
practiceVec
.Select all but the fourth and sixth elements of
practiceVec
.Select the evennumbered elements of
practiceVec
.Replace the third element of
practiceVec
with the number 5.Replace the evennumbered elements of
practiceVec
with zeroes.Replace the second, third and fifth elements of
practiceVec
with 3, 10, and 20 respectively.Reverse the order of the elements of
practiceVec
.
2.3.2 Solutions to Practice Exercises
practiceVec[5]
practiceVec[c(3,6)]
practiceVec[1:4]
mysteryVec[length(mysteryVec)]
practiceVec[4]
practiceVec[c(4,5)]
practiceVec[seq(2, length(practiceVec), by = 2)]
practiceVec[3] < 5

Here is one way:
< rep(0, times = length(seq(2 length(practiceVec)))) zeroes seq(2, length(practiceVec), by = 2)] < zeroes practiceVec[
Here’s a quicker way:
The latter approach involves “recycling” the zero. We’ll discuss recycling soon.
practiceVec[c(2, 3, 5)] < c(3, 10, 20)
practiceVec[length(practiceVec):1]
2.4 More on Logical Vectors
Consider the following expression:
13 < 20
## [1] TRUE
We constructed it with the “lessthan” operator <
. You can think of it as saying that 13 is less than 20, which is a true statement, and sure enough, R evaluates the expression 13 < 20
as TRUE
.
When you think about it, we’ve seen lots of expressions so far. Here are just a few of them:
sqrt(64)
heights
heights[1:3]
13 < 20
When we type any one of them into the console, it evaluates to a particular value. In the examples above, the value was always a vector.
Expressions like 13 < 20
that evaluate to a logical vector are often called Boolean expressions.^{2}
2.4.1 Boolean Operators
Let’s look further into Boolean expressions. Define the following two vectors:
Now let’s evaluate the expression a < b
:
a < b
## [1] FALSE TRUE FALSE
The <
operator, when applied to vectors, always works elementwise; that is, it is applied to corresponding elements of the vectors on either side of it. R’s evaluation of a < b
involves evaluation of the following three expressions:

10 < 8
(evaluates toFALSE
) 
13 < 15
(evaluates toTRUE
) 
17 < 12
(evaluates toFALSE
)
The result is a logical vector of length 3.
The <
operator is an example of a Boolean operator in R. Table 2.4.1 shows the available Boolean operators.
Operation  What It Means 

<  less than 
>  greater than 
<=  less than or equal to 
>=  greater than or equal to 
==  equal to 
&  and 
  or 
&&  and (scalar version) 
  or (scalar version) 
!  not 
2.4.1.1 Inequalities
The “numericallooking operators” (<
, <=
, >
, >=
) have their usual meanings when one is working with numerical vectors^{3} When applied to character vectors they evaluate according to an alphabetical order:
## [1] TRUE TRUE FALSE
The reasons for the evaluation above are as follows:
 D comes before t in the alphabet;
 lowercase t comes before uppercase T, according to R;
 characters for numbers come before lettercharacters, according to R.
2.4.1.2 Equality
The equality (==
) operator indicates whether the expressions being compared evaluate to the same value. Note that it’s made with two equalsigns, not one! It’s all about evaluation to the same value, not strict identity. The following examples will help to clarify this.
## Dorothy Toto
## TRUE TRUE
(Note that the resulting logical vector inherits the names of a
, the vector on the left.).
But a
and b
aren’t identical. We can see this because R has the function identical()
to test for identity:
identical(a, b)
## [1] FALSE
Corresponding elements of a
and b
have the same values, but the two vectors don’t have the same set of names, so they aren’t considered identical.
Here’s another way to see that “evaluating to the same value” is not the same as “identity”:
TRUE == 1
## [1] TRUE
When TRUE (itself of
type logical
) is being compared with something numerical (type integer
or double
) it is coerced into the numerical vector 1. (In the same situation FALSE
would be coerced to 0.) But clearly TRUE
and 1 are not identical:
identical(TRUE, 1)
## [1] FALSE
2.4.1.3 And, Or, Not
We consider an “and” statement to be true when both of its component statements are true; otherwise it is counted as false. The &
Boolean operator accords with our thinking:
## [1] TRUE FALSE FALSE FALSE
In logic and mathematics, an “or” statement is considered to be true when at least one of its component statements are true. (This is sometimes called the “inclusive” use of the term “or.”) R accords with this line of thinking:
## [1] TRUE TRUE TRUE FALSE
The &&
and 
operators follow the “and” and “or” logic respectively, but are applied only to the first elements of the vectors being compared:
## [1] TRUE
## [1] TRUE
These operators will come in handy later on, when we study conditionals.
The final Boolean operator is !
, which works like “not”:
a < c(TRUE, FALSE)
!a
## [1] FALSE TRUE
## [1] FALSE TRUE TRUE
!(e > f)
## [1] TRUE FALSE FALSE
2.5 Vector Recycling
Consider the vector
vec < c(2, 6, 1, 7, 3)
Look at what happens when we evaluate the expression:
vec > 4
## [1] FALSE TRUE FALSE TRUE FALSE
At first blush this doesn’t make any sense: vec
has length 5, whereas 4
is a vector of length 1. How can the two of them be compared?
They cannot, in fact, be compared. Instead the shorter of the two vectors—the 4
—is recycled into the c(4,4,4,4,4)
a vector of length five, which may then be compared elementwise with vec
. Recycling is a great convenience as it allows us to express an idea clearly and concisely.
Recycling is always performed on the shorter of two vectors. Consider the example below:
vec2 < 1:6
vec2 > c(3,1)
## [1] FALSE TRUE FALSE TRUE TRUE TRUE
Here, c(3,1)
was recycled into c(3,1,3,1,3,1)
prior to being compared with vec2
.
What happens if the length of the longer vector is not a multiple of the shorter one? We should look into this:
vec2 < 1:7
vec2 > c(3, 8)
## longer object length is not a multiple of shorter object length
## [1] FALSE FALSE FALSE FALSE TRUE FALSE TRUE
We get a warning, but R tries to do the job for us anyway, recycling the shorter vector to c(3,8,3,8,3,8,3)
and then performing the comparison.
By the way, if you don’t want to see the warning you can put the expression into the suppressWarnings()
function:
suppressWarnings(vec2 > c(3, 8))
## [1] FALSE FALSE FALSE FALSE TRUE FALSE
2.5.1 Practice Exercises
We’ll work with the following vectors:
person < c("Dorothy", "Scarecrow", "Tin Man", "Lion", "Toto")
age < c(12, 0.04, 15, 18, 6)
likesDogs < c(TRUE, FALSE, TRUE, FALSE, TRUE)
Think of the vectors as having corresponding elements. Thus, there is a person named Dorothy who is 12 years old and likes dogs, a person named Tin Man who is 0.04 years old and doesn’t like dogs, etc.
Write a Boolean expression that is
TRUE
when a person is less than 14 years old andFALSE
otherwise.Write a Boolean expression that is
TRUE
when a person is between 10 and 15 years old (not including 10 but not 15) andFALSE
otherwise.Write a Boolean expression that is
TRUE
when a person is more than 12 years old and likes dogs, andFALSE
otherwise.Write a Boolean expression that is
TRUE
when a person is more than 12 years old and does not like dogs, andFALSE
otherwise.Write a Boolean expression that is
TRUE
when a person is more than 12 years old and or likes dogs, andFALSE
otherwise.Write a Boolean expression that is
TRUE
when the person is Dorothy, andFALSE
otherwise.Write a Boolean expression that is
TRUE
when the person is Dorothy or Tin Man, andFALSE
otherwise.Write a Boolean expression that is
TRUE
when the person’s name comes after the letter “M” in the alphabet, andFALSE
otherwise.Write a Boolean expression that is
FALSE
when the person is Dorothy, andTRUE
otherwise.
2.5.2 Solutions to Practice Exercises

Here’s the code:
age < 14
## [1] TRUE TRUE FALSE FALSE TRUE

Here’s the code:
age >= 10 & age < 15
## [1] TRUE FALSE FALSE FALSE FALSE

Here’s the code:
age > 12 & likesDogs
## [1] FALSE FALSE TRUE FALSE FALSE

Here’s the code:
age > 12 & !likesDogs
## [1] FALSE FALSE FALSE TRUE FALSE

Here’s the code:
age > 12  likesDogs
## [1] TRUE FALSE TRUE TRUE TRUE

Here’s the code:
person == "Dorothy"
## [1] TRUE FALSE FALSE FALSE FALSE

Here’s the code:
person == "Dorothy"  person == "Tin Man"
## [1] TRUE FALSE TRUE FALSE FALSE

Here’s the code:
person > "M"
## [1] FALSE TRUE TRUE FALSE TRUE

Here’s the code:
person != "Dorothy"
## [1] FALSE TRUE TRUE TRUE TRUE
2.6 Subsetting with Logical Vectors
The subsetting we have seen up to now involves specifying the indices of the elements we would like to select from the original vector. It is also possible to say, for each element, whether or not it is to be included in our selection. This is accomplished by means of logical vectors.
Recall our heights
vector:
heights
## Scarecrow Tinman Lion Dorothy Toto Boq
## 73 70 69 60 NA 46
Let’s say that we want the heights of Scarecrow, Tinman and Dorothy. We can use a logical vector to do this:
wanted < c(TRUE, TRUE, FALSE, TRUE, FALSE, FALSE)
heights[wanted]
## Scarecrow Tinman Dorothy
## 73 70 60
The TRUE
’s at indices 1, 2, and 4 in wanted
inform R that we want the heights vector at indices 1, 2 and 4. The FALSE
’s say: “don’t include this element!”
Subsetting can be used powerfully along with logical vectors and Boolean operators.
For example, in order to select those persons whose heights exceed a certain amount, we might say something like this:
#heights of some people:
people < c(55, 64, 67, 70, 63, 72)
tall < (people >= 70)
tall
## [1] FALSE FALSE FALSE TRUE FALSE TRUE
people[tall]
## [1] 70 72
As you can see, the tall
vector specifies which elements we would like to select from the people
vector.
We need not define the tall
vector along the way. It is quite common to see something like the following:
people[people >= 70]
## [1] 70 72
I like to pronounce the above as:
people
, wherepeople
is at least 70
The word “where” in the above phrase corresponds to the subsetting operator.
Your subsetting logical vector need not have been constructed with the original vector in mind. Consider the following example:
## [1] 23 21 63
Here the selection is done from the age
vector, using a logical vector that was constructed from height
—another vector altogether. It concisely expresses the idea:
the ages of people whose height is less than 70
There is no limit to the complexity of selection. Consider the following:
age < c(23, 21, 22, 25, 63)
height < c(68, 67, 71, 70, 69)
likesToto < c(TRUE, TRUE, FALSE, FALSE, TRUE)
height[age < 60 & likesToto]
## [1] 68 67
2.6.1 Counting
Logical subsetting provides a convenient way to count the elements of a vector that possess a given property. For example, to find out how many elements of people
are less than 70 we could say:
length(people[people < 70])
## [1] 4
2.6.2 Cautions about NA
You should be aware of the effect of NA
values on subsetting.
heights
## Scarecrow Tinman Lion Dorothy Toto Boq
## 73 70 69 60 NA 46
tall < (heights > 65)
tall
## Scarecrow Tinman Lion Dorothy Toto Boq
## TRUE TRUE TRUE FALSE NA FALSE
Since Toto’s height was missing, R can’t say whether or not he was more than 65 inches tall. Hence it assigns NA
to the Totoelement of the tall
vector.
When we subset using this vector we get an odd result:
heights[tall]
## Scarecrow Tinman Lion <NA>
## 73 70 69 NA
Since R doesn’t know whether or not to select Toto, it records its indecision by including an NA
in the result. That NA
, however, is not the NA
for Toto’s height in the vector heights
, so it can’t inherit the “Toto” name. Since it has no name, R presents its name as <NA>
.
If we try to count the number of tall persons, we get a misleading result:
length(heights[tall])
## [1] 4
We would have preferred something like:
“Three, with another one undecided.”
Counting is one those situations in which we might wish to remove NA
values at the start. If the vector is small we could remove them by hand, e.g.:
knownHeights < heights[5] # remove Toto
tall < (knownHeights > 65)
length(knownHeights[tall])
## [1] 3
For longer vectors the above approach won’t be practical. Instead we may use the is.na()
function.
is.na(heights)
## Scarecrow Tinman Lion Dorothy Toto Boq
## FALSE FALSE FALSE FALSE TRUE FALSE
Then we may select those elements that are not NA
:
knownHeights < heights[!is.na(heights)]
knownHeights
## Scarecrow Tinman Lion Dorothy Boq
## 73 70 69 60 46
length(knownHeights[knownHeights > 65])
## [1] 3
2.6.3 Which, Any, All
There are several functions on logical vectors that are worth keeping in your back pocket:
2.6.3.1 which()
Applied to a logical vector, the which()
function returns the indices of the vector that have the value TRUE
:
## [1] 1 2 4
Thus if we want to know the indices of heights
where the heights are at least 65, then we write:
which(heights > 65)
## Scarecrow Tinman Lion
## 1 2 3
(Recall that height was a named vector. The logical vector heights > 65
inherited these names and passed them on to the result of whihc()
.)
Note also that Toto’s NA
height was ignored by which()
.
2.6.3.2 any()
Is anyone more than 71 inches tall? any()
will tell us:
heights
## Scarecrow Tinman Lion Dorothy Toto Boq
## 73 70 69 60 NA 46
any(heights > 71)
## [1] TRUE
Yes: the Scarecrow is more than 71 inches tall.
We can use any()
along with the equality Boolean operator ==
to determine whether or not a given value appears a a given vector:
## [1] TRUE
any(vec == "Wizard")
## [1] FALSE
The above question occurs so frequently that R provides the %in%
operator as a shortcut:
"Tin Man" %in% vec
## [1] TRUE
"Wizard" %in% vec
## [1] FALSE
2.6.3.4 NACaution
Is everyone more than 40 inches tall?
all(heights > 40)
## [1] NA
Everyone with a known height is taller than 40 inches, but because Toto’s height is NA
R can’t say whether all the heights are bigger than 40.
2.6.4 Practice Exercises
Consider the following vectors:
person < c("Abe", "Bettina", "Candace", "Devadatta", "Esmeralda")
numberKids < c(2, 1, 0, 2, 3)
yearsEducation < c(12, 16, 13, 14, 18)
hasPets < c(FALSE, FALSE, TRUE, TRUE, FALSE, TRUE)
Think of these vectors as providing information about siz people.
Write a command that produces the names of people who have more than 1 child.
Write a command that produces the numbers of children of people who have a pet.
Write a command that produces the years of education who have at least 13 years of education.
Write a command that produces the names of people who have more than one child and fewer than 15 years od education.
Write a command that produces the names of people who don’t have pets.
Write a command that produces the number of people who have pets.
Write a command that produces the number of people who don’t have pets.
Write a command that says whether or not there is someone who has more 15 years of education and at least one child, but doesn’t have any pets.
2.6.5 Solutions to the Practice Exercises
person[numberKids > 1]
numberKids[hasPets]
yearsEducation[yearsEducation >= 13]
person[numberKids > 1 & yearsEducation < 15]
person[!hasPets]

Here is one way. We’ll learn an easier way in the next section.
length(person[hasPets])
## [1] 3

Here is one way. We’ll learn an easier way in the next section.
length(person[!hasPets])
## [1] 3
2.7 Basic Arithmetical Operations on Vectors
R provides a number of arithmetical operations on pairs of numerical vectors. Table 2.2 shows the basic operators.
Operation  What It Means 

x + y  addition 
x  y  subtraction 
x * y  multiplication 
x / y  division 
x^y  exponentiation (raise x to the power y) 
x %/% y  integer division (quotient after dividing x by y) 
x %% y  x mod y (remainder after dividing x by y) 
The operators are applied elementwise to vectors:
## [1] 13 19 25
x  y
## [1] 7 11 15
x * y
## [1] 30 60 100
x / y
## [1] 3.333333 3.750000 4.000000
x^y
## [1] 1000 50625 3200000
As an illustration, the final result is:
\[10^3, 15^4, 20^5.\]
The “mod” operator %%
can be quite useful. Here is an example: even numbers have a remainder of 0 after division by 2, whereas odd numbers have a remainder of 1. Hence we may use %%
to quickly locate the even numbers in a vector, as follows:
vec < c(2, 7, 9, 12, 15, 24)
vec[vec %% 2 == 0]
## [1] 2 12 24
Recycling applies in vector arithmetic (as in most of R):
vec < c(2, 7, 9, 12, 15, 24)
2 * vec # the 2 will be recycled
## [1] 4 14 18 24 30 48
vec + 100 # the 100 will be recycled
## [1] 102 107 109 112 115 124
vec^3 # the 3 will be recycled
## [1] 8 343 729 1728 3375 13824
2.7.1 More Math Functions
You have already met sqrt()
. Here are a few more useful math functions involving vectors.
2.7.1.1 Rounding
You can use the round()
function to round off numbers to any desired number of decimal places.
roots < sqrt(1:5)
roots # Too much information!
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068
round(roots, digits = 3) # nicer
## [1] 1.000 1.414 1.732 2.000 2.236
2.7.1.2 Ceiling and Floor
The ceiling()
function returns the least integer that is greater than or equal to the given number:
## [1] 2 1 1 2 3 5
The floor()
function returns the greatest integer that is less than or equal to the given number:
floor(vec)
## [1] 3 2 0 1 3 4
2.7.1.3 Vectorization
All of the above operations follow the “vectorin, vectorout” principle—often referred to by R users as vectorization—to which R often adheres. Not only does vectorization permit us to express ideas concisely and in humanreadable fashion, but the computations themselves tend to be performed very quickly.
2.7.1.4 Summing and the Mean
There are some functions on vectors that return only a vector of length 1. Among examples we have met so far are:
Another very important function that returns a vector of length 1 is sum()
:
vecs < 1:100
sum(vecs)
## [1] 5050
In statistics we are often interested in the mean of a list of numbers. The mean is defined as:
\[\frac{\text{sum of the numbers}}{\text{how many numbers there are}}\] You can find the mean of a numerical vector as follows:
The way we compute the mean in R looks a great deal like its mathematical definition.
You might be interested to know that there is a function in R dedicated to finding the mean. Unsurprisingly, it is called mean()
:
mean(vec)
## [1] 18.4
2.7.1.5 Maximum and Minimum
The max()
function delivers the maximum value of the elements of a numerical vector:
## [1] 7
The min()
function delivers the minimum value of a numerical vector:
## [1] 2
You can enter more than one vector into min()
or max()
: the function will combine the vectors and then do its job:
## [1] 15
Both functions yield NA
when one of the elements is NA
:
max(3, 7, 2, NA)
## [1] NA
Like sum()
and mean()
, they respond to the na.rm
parameter:
max(3, 7, 2, NA, na.rm =TRUE)
## [1] 7
The pmax()
function compares corresponding elements of each inputvector and produces a vector of the maximum values:
## [1] 5 7 12
There is a pmin()
function that computes pairwise minima as well.
2.7.2 NA and NaN Considerations
What happens when you are doing mathematics on a vector, one of whose values is NA
? A vectorizing function will simply pass it along:
## [1] 1.000000 1.414214 1.732051 2.000000 NA
On the other hand a function like sum()
needs to know all of the values. If one of them is NA
, it will report their sum as NA
.
sum(vec)
## [1] NA
The same is true for the mean:
mean(vec)
## [1] NA
If we want the sum or the mean of the known values, we could first remove the NA
values as demonstrated in previous sections. We could also make use of the na.rm
parameter that these functions provide:
sum(vec, na.rm = TRUE)
## [1] 10
mean(vec , na.rm = TRUE)
## [1] 2.5
The results of some arithmetical operations sometimes are not defined. (Examples: you can’t divide by 0; you can’t take the square root of a negative number.) R reports the results of such operations as NaN
—“not a number.” R also issues a warning:
## Warning in sqrt(c(4, 2, 4)): NaNs produced
## [1] NaN 1.414214 2.000000
Keep in mind, though, that the result is a perfectly good vector as far as R is concerned. After the warning R will permit you to use it in further computations:
## Warning in sqrt(c(4, 2, 4)): NaNs produced
vec + 3
## [1] NaN 4.414214 5.000000
2.7.3 Practice Exercises
Consider the following vectors:
Write a command that produces the squares of the first 10 whole numbers.
Write a command that produces the square roots of: the numbers from 1 to 100 that are one more than a multiple of 3.
Write a command that raises 2 to the second power, 3 to third power, 4 to the fourth power, … up to 100 to the hundredth power.
Using the
sum()
function and the vectorhasPets
from the practice exercises of the previous section, write a command that says how many people have pets.Using the
sum()
function and the vectorhasPets
from the practice exercises of the previous section, write a command that says how many people do not have pets.Using the vectors from the practice exercises of the previous question, find the name of the person who has the most education.
2.7.4 Solutions to the Practice Exercises
(1:10)^2

Here are a couple of ways:
(2:100)^(2:100)

When given a logical vector, the
sum()
function convertsTRUE
to 1 andFALSE
to 0, and then adds. Accordingly, you can count how many people have pets like this:sum(hasPets)
## [1] 3

Do this:
sum(!hasPets)
## [1] 3

Try this:
person[yearsEducation == max(yearsEducation)]
## [1] "Esmeralda"
2.8 Further Notes on Syntax
In the process of learning about R, you have been unconsciously imbibing some of its syntax. The syntax of a computerprogramming is the complete set of rules that determine what combinations of symbols are considered to make a wellformed program in the language—something that R can interpret and attempt to execute.
2.8.1 Syntax Errors vs. Runtime Errors vs. Semantic Errors
For the most part you will learn the syntax informally. By now, for example, you have probably realized that when you call a function you have to supply a closing parenthesis to match the open parenthesis. Thus the following is completely fine:
sum(1:5)
## [1] 15
On the other hand if you were to type sum(1:5
alone on a single line in a R script, R Studio’s codechecker would show a red warningcircle at that line. Hovering over the circle you would see the message:
unmatched opening bracket '('
If you were to attempt to run the command sum(1:5
from the script you would get the following error message:
## Error: Incomplete expression: sum(1:5
Such an error is called a syntax error.^{4} The R Studio IDE can detect most—but not all—syntax errors.
Syntax errors in computer programming are similar to grammatical errors in ordinary language, such as:
 “Mice is scary.” (Number of the subject does not match the number of the verb.)
 “Mice are.” (Incomplete expression.)
A runtime error is an error that occurs when the syntax is correct but R is unable to finish the execution of your code for some other reason. The following code, for example, is perfectly fine from a syntactical point of view:
sum("hello")
When run, however, it produces an error:
## Error in sum("hello") : invalid 'type' (character) of argument
Here is another example:
sum(emeraldCity)
Unless for some reason you have defined the variable emeraldCity
, an attempt to run the above command will produce the following runtime error:
## Error: object 'emeraldCity' not found
Many runtime errors in computer programming resemble errors in ordinary language where the sentence is grammatically correct by does not mean anything, as in:
 “Beelbubs are juicy.” (What’s a “beelbub?”)
There is a third type of error, known in the world of programming as a semantic error. The term “semantics” refers to the meaning of things. Computer code is said to contain a semantic error when it is syntactically correct and can be executed, but does not deliver the results one knows to expect.
As an example, suppose you have defined, at some point, two variables:
emeraldCity < 15
emeraldcity < 4
Suppose now that—wanting R to compute \(15^2\)—you run the following code:
emeraldcity^2
## [1] 16
You don’t get the results you wanted, because you accidentally asked for the square of the wrong number.
Semantic errors are usually the most difficult errors for programmers to detect and repair.
2.8.2 The Assignment Operator
We have been using the assignment operator <
to assign values to variables. You should be aware that there is another assignment operator that works the other way around:
4 > emeraldCity
emeraldCity
## [1] 4
Most people don’t use it.
A popular alternative to <
as an assignment operator is the equals sign =
:
emeraldCity = 5
emeraldCity
## [1] 5
I myself prefer to stay away from it, as it can be confused with other uses of =
, such as the setting of values to parameters in functions:
rep("Dorothy", times = 3)
## [1] "Dorothy" "Dorothy" "Dorothy"
When you have to assign the same value to several values, R allows you to abbreviate a bit. Consider the following code:
a < b < c < 5
The above code has the same effect as:
a < 5
b < 5
c < 5
2.8.3 Multiple Expressions
R allows you to write more than one expression on a single line, as long as you separate the expressions with semicolons:
a < b < c < 5
a; b; c; 2+2; sum(1:5)
## [1] 5
## [1] 5
## [1] 5
## [1] 4
## [1] 15
2.8.4 Variable Names and Reserved Words
Using the assignment operator we have created quite a few variables by now, and we appear to have named them whatever we want. In fact there are very few limitation on the name of a variable. According to R’s own documentation:^{5}
“A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number.”
This leaves a lot of room for creativity. All of the following names are possible for variables:
yellowBrickRoad
yellow_brick_road
yellow.brick.road
yell23
y2e3L45Lo3....rOAD
.yellow
The following, though, are not valid:

.2yellow
(cannot start with dot and then number) 
_yellow
(cannot start with_
) 
5scones
(cannot start with a number)
Most programmers try to devise names for variables that are descriptive in the sense that they suggest to a reader of the code the role that is played within it by the variable. In addition they try to stick to a consistent system for variable names that divide naturally into meaningful words.
One popular convention is known as CamelCase. In this convention each new wordlike part of the variable names begins with a capital letter. (The initial letter, though, is often not capitalized.) Examples would be:
emeraldCity
isEven
Another popular convention—sometimes called “snakecase”—is to use lowercase and to separate words with underscores:
emerald_city
is_even
An older convention—one that was popular among some of the original developers of R—was to separate words with dots:
emerald.city
is.even
This last convention is no longer recommended, as in programming languages other than R the dot is associated syntactically with the calling of a “method” on an “object.”^{6}
There is one further restriction on variablenames that we have not yet mentioned: you are not allowed to use any of R’s reserved words. These are:
if
,else
,while
,repeat
,function
,for
,in
,next
,break
,TRUE
,FALSE
,NULL
,inf
,NaN
,NA
,NA_integer
,NA_real
,NA_complex
,NA_character
You need not memorize the above list: You’ll gradually learn most of it, and words you don’t learn are words that you are unlikely to ever choose as a variablename on your own. Besides, reserved words show in in blue in the R Studio editor, and if you manage to use one anyway then R will stop you outright with a clear error message:
break < 5
## Error in break < 5 : invalid (NULL) left side of assignment
Notice also that although TRUE
and FALSE
are reserved words, their accepted abbreviations T
and F
are not. This can lead to problems in code, if someone chooses to bind T
or F
to some value.
For example, suppose that have two lines of code like this:
T < 0
F < 1
Later on, suppose you create what you think is a logical vector:
myVector < c(T, F, F, T)
But it’s not logical:
typeof(myVector)
## [1] "double"
That’s because T
ad F
have been bound to numerical values. If you coerce myLogical
to a logical vector, you get the exact opposite of what you would have expected:
as.logical(myVector)
## [1] FALSE TRUE TRUE FALSE
The moral of the story is:
T
for TRUE
or F
for FALSE
.
One final remark: variables together with reserved words constitute the part of the R language called identifiers.
2.8.5 Practice Exercises
Suppose that you begin a new R session and that you run to following code:
person < c("Abe", "Bettina", "Candace", "Devadatta", "Esmeralda")
numberKids < c(2, 1, 0, 2, 3)
yearsEducation < c(12, 16, 13, 14, 18)
hasPets < c(FALSE, FALSE, TRUE, TRUE, FALSE, TRUE)

What sort of error (syntax, runtime or semantic) is produced by the this next piece of code, which is intended to produce the names of the people with more than 15 years of education? Why?
person(yearsEducation > 15]

What sort of error (syntax, runtime or semantic) is produced by the this next piece of code, which is intended to produce the names of the people who don’t have pets? Why?
person(!haspets]

What sort of error (syntax, runtime or semantic) is produced by the this next piece of code, which is intended to find out how many people have pets? Why?
length(hasPets)
Is
ThreeLittlePigs
a valid name for a variable? If not, why not?Is
3LittlePigs
a valid name for a variable? If not, why not?Is
LittlePigs3
a valid name for a variable? If not, why not?Is
LittlePigs3
a valid name for a variable? If not, why not?Is
Little_Pigs_3
a valid name for a variable? If not, why not?Is
three.little.pigs
a valid name for a variable? If not, why not?
2.8.6 Solutions to the Practice Exercises

This will result in a syntax error. You need brackets to select and you’ve got a parenthesis n the left. The correct syntax would be:
person[yearsEducation > 15]

This will result in a runtime error. The variable
haspets
is not defined, so R will issue a “can’t find” erro when the code is executed. Probably you meant:person[!hasPets]

This will result in a semantic error. You’ll get 5 (the number of elements in the vector
hasPets
) What you wnat can be accomplished by either one of the following: Yes.
No! Variables can’t start with a number.
Yes.
No! Hyphens aren’t allowed in variables.
Yes.
Yes.
Glossary
 Vector Type

Any one of the six basic forms the elements in an atomic vector can take. The four types we will encounter the most are: double, integer, character and logical.
 Coercion

The process of changing a vector from one type to another. Sometimes the process takes place automatically, as a convenience to the programmer.
 Subsetting

The operation of selecting one or more elements from a vector.
 Recycling

An automatic process by which R, when given two vectors, repeats elements of the shorter vector until it is as long as the longer vector. Recycling enables the two resulting vectors to be combined elementwise in operations.
 Vectorization

R’s ability to operate on each element of a vector, producing a new vector of the same length. Vectorized operations can be expressed concisely and performed very quickly.
 Reserved Words

Identifiers that are set aside by R for specific programming purposes. They cannot be used as names of variables.
 Syntax

The complete set of rules for a computer language that determine what combinations of symbols are considered to make a wellformed program in the language.
 Syntax Error

A sequence of symbols that contains a violation of one of the rules of syntax. R is unable to interpret and attempt to execute code that contains a syntax error.
 Runtime Error

An error that occurs when the computer language’s interpreter attempts to execute code but is unable to do so. A typical cause of a runtime error is the situation when the code calls for the evaluation of a name that has not been bound to an object.
 Semantic Error

An error in code that is syntactically correct and that can be executed by the computer but which produces unexpected results.
Exercises

Determine the type of each of the following vectors:

Using a combination of
c()
,rep()
andseq()
and other operations, find concise oneline programs to produce each of the following vectors: all numbers from 4 to 307 that are one more than a multiple of 3;
 the numbers 0.01, 0.02, 0.03, …, 0.98, 0.99.
 twelve 2’s, followed by twelve 4’s followed by twelve 6’s, …, followed by twelve 10’s, finishing with twelve 12’s.
 one 1, followed by two 2’s, followed by three 3’s, …, followed by nine 9’s, finishing with ten 10’s.

Using a combination of
c()
,rep()
andseq()
and other operations, find concise oneline programs to produce each of the following vectors: the numbers 15, 20, 25, …, 145, 150.
 the numbers 1.1, 1.2, 1.3, …, 9.8, 9.9, 10.0.
 ten A’s followed by ten B’s, …, followed by ten Y’s and finishing with ten Z’s.
(Hint: the special vector
LETTERS
will be useful.)  one a, followed by two b’s, followed by three c’s, …, followed by twentyfive y’s, finishing with twentysix z’s. (Hint: the special vector
letters
will be useful.)

The following three vectors gives the names, heights and ages of five people, and also say whether or not each person likes Toto:
person < c("Akash", "Bee", "Celia", "Devadatta", "Enid") age < c(23, 21, 22, 25, 63) height < c(68, 67, 71, 70, 69) likesToto < c(TRUE, TRUE, FALSE, FALSE, TRUE)
Use subsetting with logical vectors to produce vectors of:
 the names of all people over the age of 22;
 the names of all people younger than 24 who are also more than 67 inches tall;
 the names of all people who either don’t like Toto or who are over the age of 30;
 the number of people who are over the age of 22.

Consider the four vectors defined in the previous problem. Use subsetting with logical vectors to produce vectors of:
 the names of all people who are less than 70 inches tall;
 the names of all people who are between 20 and 30 years of age (not including 20 or 30);
 the names of all people who either like Toto or who are under the age of 50;
 the number of people who are more than 69 inches tall.

Logical vectors are not numerical vectors, so it would seem that you should not be able to sum their elements. But:
sum(likesToto)
results in the number 3! What is happening here is that R coerces the logical vector
likesToto
into a numerical vector of 1’s and 0’s—1 forTRUE
, 0 forFALSE
—and then sums the resulting vector. Notice that this gives us the number of people who like Toto. With this idea in mind, usesum()
along with logical vectors to find: the number of people younger than 24 who are also more than 67 inches tall;
 the number of people who either don’t like Toto or who are over the age of 30.

Read the previous problem, and then use
sum()
along with logical vectors to find: the number of people between 65 and 70 inches tall (including 65 and 70);
 the number of people who either don’t like Toto or who are under the age of 25.