2.1 What is a Vector?

If you have heard of vectors before in mathematics, you might think of a vector as something that has a magnitude and a direction, and that can be represented by a sequence of numbers. In its notion of a vector, R keeps the idea of a sequence but discards magnitude and direction. The notion of “numbers” isn’t even necessary.

For R, a vector is simply a sequence of elements. There are two general sort of vectors:

  • atomic vectors that come in one of six forms called vector types;
  • non-atomic vectors, called lists, whose elements can be any sort of R-object at all.

For now we’ll just study atomic vectors. Let’s make a few vectors, as examples.

We can make a vector of numbers using the c() function:

numVec <- c(23.2, 45, 631, -273, 0, 48.371, 100000,
            85, 92, -236, 8546, 98774, 0, 0, 1, 3)
numVec
##  [1]     23.200     45.000    631.000   -273.000      0.000     48.371 100000.000
##  [8]     85.000     92.000   -236.000   8546.000  98774.000      0.000      0.000
## [15]      1.000      3.000

You can think of c as standing for “combine.” c() takes its arguments, all of which are separated by commas, and combines them to make a vector.

If you closely examine the above output, you’ll notice that R printed out all of the numerical values in the vector to three decimal places, which happened to be the largest number of decimal places we assigned to any of the numbers that made up numVec. You’ll also notice the numbers in brackets at the beginning of the lines. Each number represents the position within the vector occupied by the first element of the vector that is printed on the line. The position of an element in a vector is called its index. Reporting the indices of leading elements helps you locate particular elements in the output.

2.1.1 Types of Atomic Vectors

The numbers in numVec are what programmers call double-precision numbers. You can verify this for yourself with the typeof() function:

typeof(numVec)
## [1] "double"

The typeof() function returns the type of any object in R. As far as vectors are concerned, there are six possible types, of which we will deal with only four:

  • double
  • integer
  • character
  • logical

Let’s look at examples of the other types. Here is a vector of type integer:

intVec <- c(3L, 17L, -22L, 45L)
intVec
## [1]   3  17 -22  45

The L after each number signifies to R that the number should be stored in memory as an integer, rather than in double-precision format. Officially, the type is integer:

typeof(intVec)
## [1] "integer"

You should know that if you left off one or more of the L’s, then R would create a vector of type double:

numVec2 <- c(3, 17, -22, 45)
typeof(numVec2)
## [1] "double"

We won’t work much with integer-type vectors, but you’ll see them out in the wild.

We can also make vectors out of pieces of text called strings: these are called character vectors. As noted in the previous chapter, we use quotes to delimit strings:

strVec <- c("Brains", "are", "not", "the", "best", 
            "things", "in", "the", "world", "93.2")
strVec
##  [1] "Brains" "are"    "not"    "the"    "best"   "things" "in"     "the"    "world" 
## [10] "93.2"
typeof(strVec)
## [1] "character"

Notice that "93.2" makes a string, not a number.

The last type of vectors to consider are the logical vectors. Here is an example:

logVec <- c(TRUE, FALSE, T, T, F, F, FALSE)
logVec
## [1]  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE

In order to represent a logical value you can use:

  • TRUE or T to represent truth;
  • FALSE or F to represent falsity.

You can’t represent truth or falsity any other way. If you try anything else—like the following—you get an error:

badVec <- c(TRUE, false)
## Error: object 'false' not found

Note: Although R allows T to be interpreted as TRUE and F as FALSE, it can be dangerous to use them in some circumstances. Best to get into the habit of always using TRUE and FALSE, rather than the permitted abbreviations.

2.1.2 Coercion

What would happen if you tried to represent falsity with the string "false"?

newVector <- c(TRUE, "false")
newVector
## [1] "TRUE"  "false"

newVector is not a logical vector. Check it out:

typeof(newVector)
## [1] "character"

In order to understand what just happened here, you must recall that all of the elements of an atomic vector have to be of the same type. If the c() function is presented with values of different types, then R follows a set of internal rules to coerce some of the values to a new type in such a way that all resulting values are of the same type. You don’t need to know all of the coercion rules, but it’s worth noting that

  • character beats double,
  • which in turn beats integer,
  • which in in turn beats logical.

The following examples show this:

typeof(c("one", 1, 1L, TRUE))
## [1] "character"
typeof(c(1, 1L, TRUE))
## [1] "double"
typeof(c(1L, TRUE))
## [1] "integer"

Automatic coercion can be convenient in some circumstances, but in others it can give unexpected results. It’s best to keep track of what types you are dealing with and to exercise caution when combining values to make new vectors.

You can also coerce vectors “manually” with the functions:

  • as.numeric() ;
  • as.integer() ;
  • as.character() ;
  • as.logical() .

Here are some examples:

numVec <- c(3, 2.5, -7.32, 0)
as.character(numVec)
## [1] "3"     "2.5"   "-7.32" "0"
as.integer(numVec)
## [1]  3  2 -7  0
as.logical(numVec)
## [1]  TRUE  TRUE  TRUE FALSE

Note that in coercion from numerical to logical, the number 0 becomes FALSE and all non-zero numbers become TRUE.

2.1.3 Combining Vectors

You can combine vectors you have already created to make new, bigger ones:

numVec1 <- c(5, 3, 10)
numVec2 <- c(1, 2, 3, 4, 5, 6)
numCombined <- c(numVec1, numVec2)
numCombined
## [1]  5  3 10  1  2  3  4  5  6

You can see here that vectors are different from sets: they are allowed to repeat the same value in different indices, as we see in the case of the 3’s above.

2.1.4 NA Values

Consider the following vector, which we may think of as recording the heights of people, in inches:

heights <- c(72, 70, 69, 58, NA, 45)

The NA in the fifth position of the vector is a special value that may be considered to mean “Not Assigned.” It’s R’s way of letting us indicate that a value was not recorded or has gone missing for some reason.

2.1.5 “Everything in R is a Vector”

Some folks say that everything in R is a vector. That’s a bit of an exaggeration but it’s remarkably close to the truth.

And yet it seems implausible. What about the elements of an atomic vector, for instance? A single element doesn’t look at all like a vector: it’s a value, not a sequence of values.

Or so we might think. But really, in R there are no “single values” that can exist by themselves. Consider, for instance, what we think of as the number 17:

17
## [1] 17

See the [1] in front, in the output above? It indicates that the line begins with the first element of a vector. So 17 doesn’t exist on its own: it exists a vector of type double—a vector of length 1.

Even NA is, all along, a vector of length 1

NA
## [1] NA

It is of type logical:

typeof(NA)
## [1] "logical"

Note that even the type of NA evaluates, in R, to a vector: a character vector of length 1 whose only element is the string “logical!”

2.1.6 Named Vectors

The elements of a vector can have names, if we like:

ages <- c(Bettina = 32, Chris = 64, Ramesh = 101)
ages
## Bettina   Chris  Ramesh 
##      32      64     101

Having names doesn’t keep the vector from being a vector of type double: it has to be double because its elements are double.

typeof(ages)
## [1] "double"

We can names the elements of a vector when we create it with c(), or we can name them later on. One way to do this is with the names() function:

names(heights) <- c("Scarecrow", "Tinman", "Lion", "Dorothy", "Toto", "Boq")
heights
## Scarecrow    Tinman      Lion   Dorothy      Toto       Boq 
##        72        70        69        58        NA        45

2.1.7 Special Character Vectors

R comes with two handy, predefined character vectors:

letters
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u"
## [22] "v" "w" "x" "y" "z"
LETTERS
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U"
## [22] "V" "W" "X" "Y" "Z"

We will make use of them from time to time.

2.1.8 Length of Vectors

The length() function tells us how many elements a vector has:

length(heights)
## [1] 6

2.1.9 Practice Exercises

  1. Consider the following vector:

    upperLower <- c(LETTERS, letters)

    What should the length of upperLower be? Check you answer using the length() function.

  2. True or False: c("a", 2, TRUE) yields a vector of length three consisting of the string "a", the number 2 and the logical value TRUE.

  3. The function as.numeric() tries to coerce its input into numbers. How well can it pick out the “numbers” in strings. Try the following calls. When did as.numeric() find the numbers that was probably intended?

    as.numeric("3.214")
    as.numeric("3L")
    as.numeric("fifty")
    as.numeric("10 + 3")
    as.numeric("3.25e-3")  # scientific notation:  3.25 times 10^(-2)
    as.numeric("31,245")

2.1.10 Solutions to Practice Exercises

  1. There are 26 letters, so the length of upperlower should be \(2 \times 26 = 52\). Let’s check:

    length(upperLower)
    ## [1] 52
  2. False! The resulting vector will be atomic—all of its elements will be the same data type. The non-strings will be coerced to strings, yielding:

    c("a", 2, TRUE)
    ## [1] "a"    "2"    "TRUE"
  3. as.numeric() isn’t very smart: it picked out the number in "3.214" and 3.25e-3, but in the other cases it returned NA.