2.1 What is a Vector?
If you have heard of vectors before in mathematics, you might think of a vector as something that has a magnitude and a direction, and that can be represented by a sequence of numbers. In its notion of a vector, R keeps the idea of a sequence but discards magnitude and direction. The notion of “numbers” isn’t even necessary.
For R, a vector is simply a sequence of elements. There are two general sort of vectors:
- atomic vectors that come in one of six forms called vector types;
- non-atomic vectors, called lists, whose elements can be any sort of R-object at all.
For now we’ll just study atomic vectors. Let’s make a few vectors, as examples.
We can make a vector of numbers using the c()
function:
<- c(23.2, 45, 631, -273, 0, 48.371, 100000,
numVec 85, 92, -236, 8546, 98774, 0, 0, 1, 3)
numVec
## [1] 23.200 45.000 631.000 -273.000 0.000 48.371 100000.000
## [8] 85.000 92.000 -236.000 8546.000 98774.000 0.000 0.000
## [15] 1.000 3.000
You can think of c
as standing for “combine.” c()
takes its arguments, all of which are separated by commas, and combines them to make a vector.
If you closely examine the above output, you’ll notice that R printed out all of the numerical values in the vector to three decimal places, which happened to be the largest number of decimal places we assigned to any of the numbers that made up numVec
. You’ll also notice the numbers in brackets at the beginning of the lines. Each number represents the position within the vector occupied by the first element of the vector that is printed on the line. The position of an element in a vector is called its index. Reporting the indices of leading elements helps you locate particular elements in the output.
2.1.1 Types of Atomic Vectors
The numbers in numVec
are what programmers call double-precision numbers. You can verify this for yourself with the typeof()
function:
typeof(numVec)
## [1] "double"
The typeof()
function returns the type of any object in R. As far as vectors are concerned, there are six possible types, of which we will deal with only four:
double
integer
character
logical
Let’s look at examples of the other types. Here is a vector of type integer
:
<- c(3L, 17L, -22L, 45L)
intVec intVec
## [1] 3 17 -22 45
The L
after each number signifies to R that the number should be stored in memory as an integer, rather than in double-precision format. Officially, the type is integer
:
typeof(intVec)
## [1] "integer"
You should know that if you left off one or more of the L
’s, then R would create a vector of type double
:
<- c(3, 17, -22, 45)
numVec2 typeof(numVec2)
## [1] "double"
We won’t work much with integer-type vectors, but you’ll see them out in the wild.
We can also make vectors out of pieces of text called strings: these are called character vectors. As noted in the previous chapter, we use quotes to delimit strings:
<- c("Brains", "are", "not", "the", "best",
strVec "things", "in", "the", "world", "93.2")
strVec
## [1] "Brains" "are" "not" "the" "best" "things" "in" "the" "world"
## [10] "93.2"
typeof(strVec)
## [1] "character"
Notice that "93.2"
makes a string, not a number.
The last type of vectors to consider are the logical
vectors. Here is an example:
<- c(TRUE, FALSE, T, T, F, F, FALSE)
logVec logVec
## [1] TRUE FALSE TRUE TRUE FALSE FALSE FALSE
In order to represent a logical value you can use:
TRUE
orT
to represent truth;FALSE
orF
to represent falsity.
You can’t represent truth or falsity any other way. If you try anything else—like the following—you get an error:
<- c(TRUE, false) badVec
## Error: object 'false' not found
Note: Although R allows T
to be interpreted as TRUE
and F
as FALSE
, it can be dangerous to use them in some circumstances. Best to get into the habit of always using TRUE
and FALSE
, rather than the permitted abbreviations.
2.1.2 Coercion
What would happen if you tried to represent falsity with the string "false"
?
<- c(TRUE, "false")
newVector newVector
## [1] "TRUE" "false"
newVector
is not a logical vector. Check it out:
typeof(newVector)
## [1] "character"
In order to understand what just happened here, you must recall that all of the elements of an atomic vector have to be of the same type. If the c()
function is presented with values of different types, then R follows a set of internal rules to coerce some of the values to a new type in such a way that all resulting values are of the same type. You don’t need to know all of the coercion rules, but it’s worth noting that
character
beatsdouble
,- which in turn beats
integer
, - which in in turn beats
logical
.
The following examples show this:
typeof(c("one", 1, 1L, TRUE))
## [1] "character"
typeof(c(1, 1L, TRUE))
## [1] "double"
typeof(c(1L, TRUE))
## [1] "integer"
Automatic coercion can be convenient in some circumstances, but in others it can give unexpected results. It’s best to keep track of what types you are dealing with and to exercise caution when combining values to make new vectors.
You can also coerce vectors “manually” with the functions:
as.numeric()
;as.integer()
;as.character()
;as.logical()
.
Here are some examples:
<- c(3, 2.5, -7.32, 0)
numVec as.character(numVec)
## [1] "3" "2.5" "-7.32" "0"
as.integer(numVec)
## [1] 3 2 -7 0
as.logical(numVec)
## [1] TRUE TRUE TRUE FALSE
Note that in coercion from numerical to logical, the number 0 becomes FALSE
and all non-zero numbers become TRUE
.
2.1.3 Combining Vectors
You can combine vectors you have already created to make new, bigger ones:
<- c(5, 3, 10)
numVec1 <- c(1, 2, 3, 4, 5, 6)
numVec2 <- c(numVec1, numVec2)
numCombined numCombined
## [1] 5 3 10 1 2 3 4 5 6
You can see here that vectors are different from sets: they are allowed to repeat the same value in different indices, as we see in the case of the 3’s above.
2.1.4 NA Values
Consider the following vector, which we may think of as recording the heights of people, in inches:
<- c(72, 70, 69, 58, NA, 45) heights
The NA
in the fifth position of the vector is a special value that may be considered to mean “Not Assigned.” It’s R’s way of letting us indicate that a value was not recorded or has gone missing for some reason.
2.1.5 “Everything in R is a Vector”
Some folks say that everything in R is a vector. That’s a bit of an exaggeration but it’s remarkably close to the truth.
And yet it seems implausible. What about the elements of an atomic vector, for instance? A single element doesn’t look at all like a vector: it’s a value, not a sequence of values.
Or so we might think. But really, in R there are no “single values” that can exist by themselves. Consider, for instance, what we think of as the number 17:
17
## [1] 17
See the [1]
in front, in the output above? It indicates that the line begins with the first element of a vector. So 17 doesn’t exist on its own: it exists a vector of type double
—a vector of length 1.
Even NA
is, all along, a vector of length 1
NA
## [1] NA
It is of type logical
:
typeof(NA)
## [1] "logical"
Note that even the type of NA
evaluates, in R, to a vector: a character vector of length 1 whose only element is the string “logical!”
2.1.6 Named Vectors
The elements of a vector can have names, if we like:
<- c(Bettina = 32, Chris = 64, Ramesh = 101)
ages ages
## Bettina Chris Ramesh
## 32 64 101
Having names doesn’t keep the vector from being a vector of type double
: it has to be double
because its elements are double
.
typeof(ages)
## [1] "double"
We can names the elements of a vector when we create it with c()
, or we can name them later on. One way to do this is with the names()
function:
names(heights) <- c("Scarecrow", "Tinman", "Lion", "Dorothy", "Toto", "Boq")
heights
## Scarecrow Tinman Lion Dorothy Toto Boq
## 72 70 69 58 NA 45
2.1.7 Special Character Vectors
R comes with two handy, predefined character vectors:
letters
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u"
## [22] "v" "w" "x" "y" "z"
LETTERS
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U"
## [22] "V" "W" "X" "Y" "Z"
We will make use of them from time to time.
2.1.8 Length of Vectors
The length()
function tells us how many elements a vector has:
length(heights)
## [1] 6
2.1.9 Practice Exercises
Consider the following vector:
<- c(LETTERS, letters) upperLower
What should the length of
upperLower
be? Check you answer using thelength()
function.True or False:
c("a", 2, TRUE)
yields a vector of length three consisting of the string"a"
, the number 2 and the logical valueTRUE
.The function
as.numeric()
tries to coerce its input into numbers. How well can it pick out the “numbers” in strings. Try the following calls. When didas.numeric()
find the numbers that was probably intended?as.numeric("3.214") as.numeric("3L") as.numeric("fifty") as.numeric("10 + 3") as.numeric("3.25e-3") # scientific notation: 3.25 times 10^(-2) as.numeric("31,245")
2.1.10 Solutions to Practice Exercises
There are 26 letters, so the length of
upperlower
should be \(2 \times 26 = 52\). Let’s check:length(upperLower)
## [1] 52
False! The resulting vector will be atomic—all of its elements will be the same data type. The non-strings will be coerced to strings, yielding:
c("a", 2, TRUE)
## [1] "a" "2" "TRUE"
as.numeric()
isn’t very smart: it picked out the number in"3.214"
and3.25e-3
, but in the other cases it returnedNA
.