The function `xyplot()`

makes scatterplots to indicate the relationship between two numerical variables. It comes from the `lattice`

package for statistical graphics, which is pre-installed with every distribution of R. Also, package `tigerstats`

depends on lattice, so if you load `tigerstats`

:

`require(tigerstats)`

then `lattice`

will be loaded as well.

Suppose you want to know:

Do students with higher GPA’s tend to drive more slowly than students with lower GPA’s?

If so, then you might check to see if numerical variable **fastest** (in the `m111survey`

data frame from the `tigerstats`

package) is related to the numerical variable `GPA`

. Then you can make a scatterplot as follows:

```
xyplot(fastest~GPA,data=m111survey,
xlab="grade point average",
ylab="speed (mph)",
main="Fastest Speed Ever Driven,\nby Grade Point Average")
```

Note the use of:

- the
`xlab`

argument to label the horizontal axis; - the
`ylab`

argument to label the vertical axis, complete with units (miles per hour); - the
`main`

argument to provide a brief but descriptive title for the graph; - the “” to make two lines in the title (useful if you have a long title).

When we think of one variable as explanatory and the other as the response, it is common to put the explanatory on the horizontal axis and the response on the vertical axis. This is accomplished by the formula

\[response \sim explanatory\]

If you want desire a regression line along with your scatterplot, use the argument `type`

, as follows:

```
xyplot(fastest~GPA,data=m111survey,
xlab="grade point average",
ylab="speed (mph)",
main="Fastest Speed Ever Driven,\nby Grade Point Average",
type=c("p","r"))
```

The list given by `c("p","r")`

tells `xyplot()`

that we want both the points (“p”) and a regression line (“r”).

You can also get loess curves:

```
xyplot(fastest~GPA,data=m111survey,
xlab="grade point average",
ylab="speed (mph)",
main="Fastest Speed Ever Driven,\nby Grade Point Average",
type=c("p","smooth"))
```

The `groups`

argument enables you to introduce a third variable into your analysis, so long as that variable is a factor. For example, suppose you want to study the relationship between `fastest`

and `GPA`

, but see if the nature of relationship is different for women than it is for men. You can do this with the `groups`

argument:

```
xyplot(fastest~GPA,data=m111survey,
groups = sex,
auto.key = TRUE,
xlab="grade point average",
ylab="speed (mph)",
main="Fastest Speed Ever Driven,\nby Grade Point Average",
type=c("p","r"))
```

Setting `groups = sex`

prodcued to scatterplots (with regression lines), one for the gals and another for the guys. The legend at the top of the graph was produced by `auto.key = TRUE`

.

You can vary the type of point using the `pch`

argument. For example:

```
xyplot(fastest~GPA,data=m111survey,
xlab="grade point average",
ylab="speed (mph)",
main="Fastest Speed Ever Driven,\nby Grade Point Average",
pch=19)
```

There are 25 different values for `pch`

: the integers 1 through 25.

If you want to set a specific color for all of the points on your plot, the `col`

argument is your friend:

```
xyplot(fastest~GPA,data=m111survey,
xlab="grade point average",
ylab="speed (mph)",
main="Fastest Speed Ever Driven,\nby Grade Point Average",
pch = 19,
col = "red")
```

There are many, many values for `col`

. You can explore all 657 of the named colors with the command:

`colors()`

When you use the `groups`

argument, you may want to customize the colors that go with each group. This is accomplished by temporarily changing the `lattice`

parameter settings via the `par.settings`

argument. For example to make the female points blue and the male points red, do the following:

```
xyplot(fastest~GPA,data=m111survey,
groups = sex,
auto.key = TRUE,
par.settings = list(superpose.symbol = list(col = c("blue","red"),
pch = 19),
superpose.line = list(col = c("blue","red"),
lwd = 2)),
xlab="grade point average",
ylab="speed (mph)",
main="Fastest Speed Ever Driven,\nby Grade Point Average",
type=c("p","smooth"))
```

In `par.settings`

the `superpose.symbol`

list specifies properties of the points in the scatterlot. `superpose.line`

specifies properties of the lines drawn for each group. (Setting `lwd`

to 2 made the loess curves a bit thicker.)

You can incorporate additional variables into your analysis by *facetting*, i.e., producing a plot with separate panels for each of several subgroups of the observations, as determined by one or two other variables. Unlike the variable in the `groups`

argument, these don’t have to be factor variables!

Suppose, for example, that we would like to study the relationship between GPA and fastest speed ever driven, but to break the subjects down further into groups determined by their height and by where they prefer to sit in a classroom. The following code accomplishes this:

```
Height <- equal.count(m111survey$height, number = 2, overlap = 0.1)
xyplot(fastest ~ GPA | Height * seat,
data = m111survey, pch = 19, type = c("p","r"),
layout = c(2,3))
```

In the code above, the line:

`Height <- equal.count(m111survey$height, number = 2, overlap = 0.1)`

prepares us to facet by height. The `equal.count()`

function takes a numerical variable and divides its values into groups of approximately equal size. The number of the groups is specified by the `number`

argument. In this case we are asking for two groups: “shorter” students and “taller” students. The groups are permitted to contain some members in common, and the allowed percentage intersection is specified by the `overlap`

argument. (Setting `overlap = 0`

would result in completely disjoint groups.) The new variable `Height`

is called a *shingle*, but you can think of it as a factor variable with two values: shorter and taller.

The formula `fastest ~ sex | Height * seat`

*facets* by `Height`

and `seat`

. The variables by which you facet appear after a `|`

bar, anf if you facet by two variables then you must separate them with a `*`

. Since `Height`

has two values and `seat has three values and \(2 \times 3 = 6\), we arrive at a plot with six panels.

The `layout`

argument determines the number of rows and columns in our facet-ted plot. Setting `layout`

to `c(2,3)`

specified two columns and three rows. (Note that the columns are specified first!)

For other refinements, consult the **Lattice Scatter Plot** Addin in RStudio.