11.2 Characters and Special Characters

Strings are made up of characters: that’s why R calls them “character vectors.” From your point of view as a speaker of the English language, characters would seem to be the things you would have entered on a typewriter, and which can be entered from your computer keyboard as well:

  • the lower-case letters a-z;
  • the upper case letters A-Z;
  • the digits 0,1, …, 9 (0-9);
  • the punctuation characters: ., -, ?, !, ;, :, etc. (and of course the comma, too!)
  • a few other special-use characters: ~, @, #, $, %, _, +, =, and so on;
  • and the space, too!

All of the above can be part of a string.

But quote-marks (used in quotation and as apostrophes) can also be part of a string:

"Welcome", she said, "the coffee's on me!"

Since quote-marks are used to delimit strings but can also be part of them, designers of programming languages have to think carefully about how to manage quote-marks. Here’s how it works in R:

  • If you choose to delimit a string with double-quotes, then you can put single-quotes anywhere you like within the string and they will be treated by the computer as literal single-quotes, not as string-delimiters. Here is an example:

    ## 'Hello', she said.
  • If you delimit with double-quotes and you want to place a double-quote in your string, then you have to escape that double-quote with the backslash character \:

    ## "Hello", she said.
  • If you choose to delimit a string with single-quotes, then you can put double-quotes anywhere you like within the string and they will be treated by the computer as literal double-quotes, not as string-delimiters.

    ## "Hello", she said.
  • If you delimit with single-quotes and you want to place a single-quote in your string, then you have to escape that single-quote:

    ## 'Hello', she said.

In R and in many other programming languages the backslash \ permits the following character to “escape” any special meaning that is otherwise assigned to it by the language. When we write \" we say that we are “escaping” the double-quote; more precisely, we are escaping the special role of the double-quote as a delimiter for strings.

Of course the foregoing implies that the backslash character has a special role in the language: as an escaping-device. So what can we do if we want a literal backslash in our string? Well, we simply escape it by preceding it with a backslash:

## up\down

Another example:

## C:\\Inetpub\\vhosts\\example.com

So much for “ordinary” characters. But there are special characters, too, sometimes called control characters, that do not represent written symbols. We have seen a couple of them already; the newline character \n is one:

## Farewell!  # first \n moves us to a new line ...
##            # .. which is empty due the next \n

We have also seen the tab-character \t:

## First Name   Last Name

Notice that the backslash character is used here to allow the n and t to escape their customary roles as the letters “n” and t respectively.

If you ask R, (try help(Quotes)), you will learn that there are several control characters, including:

Table 11.1: Some control characters.
Character Meaning
\n newline
\r carriage return
\t tab
\b backspace
\a alert (bell)
\f form feed
\v vertical tab

It is worth exploring their effects. Here are a couple of examples27:

## Hell o
## Hell
o

A number of other non-control characters can be generated with the backslash. Unicode characters, for instance, are generated by \u{nnnn}, where the n’s represent hexadecimal digits. Try the following in your console, and see what you get:

## ☃

Or, for something zanier:

## Hello‮there, Friend!

  1. Note that cat("Hell\ao") won’t give you “Hello” with a bell-sound. To hear a bell you have to work with a terminal on your own computer. On Linux or Mac, type echo -e "\a" and you should hear a beep.