2.8 Further Notes on Syntax
In the process of learning about R, you have been unconsciously imbibing some of its syntax. The syntax of a computer-programming is the complete set of rules that determine what combinations of symbols are considered to make a well-formed program in the language—something that R can interpret and attempt to execute.
2.8.1 Syntax Errors vs. Run-time Errors vs. Semantic Errors
For the most part you will learn the syntax informally. By now, for example, you have probably realized that when you call a function you have to supply a closing parenthesis to match the open parenthesis. Thus the following is completely fine:
sum(1:5)
## [1] 15
On the other hand if you were to type sum(1:5
alone on a single line in a R script, R Studio’s code-checker would show a red warning-circle at that line. Hovering over the circle you would see the message:
unmatched opening bracket '('
If you were to attempt to run the command sum(1:5
from the script you would get the following error message:
## Error: Incomplete expression: sum(1:5
Such an error is called a syntax error.4 The R Studio IDE can detect most—but not all—syntax errors.
Syntax errors in computer programming are similar to grammatical errors in ordinary language, such as:
- “Mice is scary.” (Number of the subject does not match the number of the verb.)
- “Mice are.” (Incomplete expression.)
A run-time error is an error that occurs when the syntax is correct but R is unable to finish the execution of your code for some other reason. The following code, for example, is perfectly fine from a syntactical point of view:
sum("hello")
When run, however, it produces an error:
## Error in sum("hello") : invalid 'type' (character) of argument
Here is another example:
sum(emeraldCity)
Unless for some reason you have defined the variable emeraldCity
, an attempt to run the above command will produce the following run-time error:
## Error: object 'emeraldCity' not found
Many run-time errors in computer programming resemble errors in ordinary language where the sentence is grammatically correct by does not mean anything, as in:
- “Beelbubs are juicy.” (What’s a “beelbub?”)
There is a third type of error, known in the world of programming as a semantic error. The term “semantics” refers to the meaning of things. Computer code is said to contain a semantic error when it is syntactically correct and can be executed, but does not deliver the results one knows to expect.
As an example, suppose you have defined, at some point, two variables:
<- 15
emeraldCity <- 4 emeraldcity
Suppose now that—wanting R to compute \(15^2\)—you run the following code:
^2 emeraldcity
## [1] 16
You don’t get the results you wanted, because you accidentally asked for the square of the wrong number.
Semantic errors are usually the most difficult errors for programmers to detect and repair.
2.8.2 The Assignment Operator
We have been using the assignment operator <-
to assign values to variables. You should be aware that there is another assignment operator that works the other way around:
4 -> emeraldCity
emeraldCity
## [1] 4
Most people don’t use it.
A popular alternative to <-
as an assignment operator is the equals sign =
:
= 5
emeraldCity emeraldCity
## [1] 5
I myself prefer to stay away from it, as it can be confused with other uses of =
, such as the setting of values to parameters in functions:
rep("Dorothy", times = 3)
## [1] "Dorothy" "Dorothy" "Dorothy"
When you have to assign the same value to several values, R allows you to abbreviate a bit. Consider the following code:
<- b <- c <- 5 a
The above code has the same effect as:
<- 5
a <- 5
b <- 5 c
2.8.3 Multiple Expressions
R allows you to write more than one expression on a single line, as long as you separate the expressions with semicolons:
<- b <- c <- 5
a 2+2; sum(1:5) a; b; c;
## [1] 5
## [1] 5
## [1] 5
## [1] 4
## [1] 15
2.8.4 Variable Names and Reserved Words
Using the assignment operator we have created quite a few variables by now, and we appear to have named them whatever we want. In fact there are very few limitation on the name of a variable. According to R’s own documentation:5
“A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number.”
This leaves a lot of room for creativity. All of the following names are possible for variables:
yellowBrickRoad
yellow_brick_road
yellow.brick.road
yell23
y2e3L45Lo3....rOAD
.yellow
The following, though, are not valid:
.2yellow
(cannot start with dot and then number)_yellow
(cannot start with_
)5scones
(cannot start with a number)
Most programmers try to devise names for variables that are descriptive in the sense that they suggest to a reader of the code the role that is played within it by the variable. In addition they try to stick to a consistent system for variable names that divide naturally into meaningful words.
One popular convention is known as CamelCase. In this convention each new word-like part of the variable names begins with a capital letter. (The initial letter, though, is often not capitalized.) Examples would be:
emeraldCity
isEven
Another popular convention—sometimes called “snake-case”—is to use lowercase and to separate words with underscores:
emerald_city
is_even
An older convention—one that was popular among some of the original developers of R—was to separate words with dots:
emerald.city
is.even
This last convention is no longer recommended, as in programming languages other than R the dot is associated syntactically with the calling of a “method” on an “object.”6
There is one further restriction on variable-names that we have not yet mentioned: you are not allowed to use any of R’s reserved words. These are:
if
,else
,while
,repeat
,function
,for
,in
,next
,break
,TRUE
,FALSE
,NULL
,inf
,NaN
,NA
,NA_integer
,NA_real
,NA_complex
,NA_character
You need not memorize the above list: You’ll gradually learn most of it, and words you don’t learn are words that you are unlikely to ever choose as a variable-name on your own. Besides, reserved words show in in blue in the R Studio editor, and if you manage to use one anyway then R will stop you outright with a clear error message:
break <- 5
## Error in break <- 5 : invalid (NULL) left side of assignment
Notice also that although TRUE
and FALSE
are reserved words, their accepted abbreviations T
and F
are not. This can lead to problems in code, if someone chooses to bind T
or F
to some value.
For example, suppose that have two lines of code like this:
<- 0
T <- 1 F
Later on, suppose you create what you think is a logical vector:
<- c(T, F, F, T) myVector
But it’s not logical:
typeof(myVector)
## [1] "double"
That’s because T
ad F
have been bound to numerical values. If you coerce myLogical
to a logical vector, you get the exact opposite of what you would have expected:
as.logical(myVector)
## [1] FALSE TRUE TRUE FALSE
The moral of the story is:
T
for TRUE
or F
for FALSE
.
One final remark: variables together with reserved words constitute the part of the R language called identifiers.
2.8.5 Practice Exercises
Suppose that you begin a new R session and that you run to following code:
<- c("Abe", "Bettina", "Candace", "Devadatta", "Esmeralda")
person <- c(2, 1, 0, 2, 3)
numberKids <- c(12, 16, 13, 14, 18)
yearsEducation <- c(FALSE, FALSE, TRUE, TRUE, FALSE, TRUE) hasPets
What sort of error (syntax, runtime or semantic) is produced by the this next piece of code, which is intended to produce the names of the people with more than 15 years of education? Why?
person(yearsEducation > 15]
What sort of error (syntax, runtime or semantic) is produced by the this next piece of code, which is intended to produce the names of the people who don’t have pets? Why?
person(!haspets]
What sort of error (syntax, runtime or semantic) is produced by the this next piece of code, which is intended to find out how many people have pets? Why?
length(hasPets)
Is
ThreeLittlePigs
a valid name for a variable? If not, why not?Is
3LittlePigs
a valid name for a variable? If not, why not?Is
LittlePigs3
a valid name for a variable? If not, why not?Is
Little-Pigs-3
a valid name for a variable? If not, why not?Is
Little_Pigs_3
a valid name for a variable? If not, why not?Is
three.little.pigs
a valid name for a variable? If not, why not?
2.8.6 Solutions to the Practice Exercises
This will result in a syntax error. You need brackets to select and you’ve got a parenthesis n the left. The correct syntax would be:
> 15] person[yearsEducation
This will result in a runtime error. The variable
haspets
is not defined, so R will issue a “can’t find” erro when the code is executed. Probably you meant:!hasPets] person[
This will result in a semantic error. You’ll get 5 (the number of elements in the vector
hasPets
) What you wnat can be accomplished by either one of the following:sum(!hasPets) # nice and snappy! length(hasPets[!hasPets]) # kinda awkward
Yes.
No! Variables can’t start with a number.
Yes.
No! Hyphens aren’t allowed in variables.
Yes.
Yes.
R is a bit more forgiving if you type
sum(1:5
directly into the console and press Enter. Instead of throwing an error, R shows a+
prompt, hoping for further input that would correctly complete the command. If you are ever in the situation where you do not know how to complete the command, you may simply press the Escape key (upper left-hand corner of your keyboard): R will then abort the command and return to a regular prompt.↩︎See
help(make.names)
.↩︎We will look briefly at R’s object-oriented capabilities in Chapter 15.↩︎