Preliminaries

We assume that the instructor is familiar with R, and has a basic understanding of the mosaic package and its use in the teaching of statistics. Consult, e.g., the mosaic vignette “Start Teaching Statistics Using R.”

Package Requirements

In its current version package tigerstats depends on the packages mosaic, mosaicDataand abd. All of these will be attached when one attaches tigerstats:

library(tigerstats)

Basic Routines

tigerstats is used in an elementary classroom setting where students are taught to choose descriptive and inferential methods based upon an analysis of the variables that pertain to a question of interest. Thus if a student finds that she is interested in the relationship between two factor variables, then she knows that she can examine the question descriptively using bar charts, two-way tables, etc., whereas histograms, density plots and the like are not appropriate. The vignettes:

  • R: Descriptive Statistics
  • R: Inferential Statistics

serve for students as a reminder of which procedures are appropriate for various types of questions about data (descriptive) or the population/processe from which the data is derived (inferential). Prospective instructors should consult these documents to get a sense of how the major package functions figure into an elementary course.

There are also vignettes to guide students on the use of specific functions that appear frequently in an elementary course. Some vignettes cover functions from R’s stats package, others cover graphical functions from the lattice package (although when tigerstats is attached some of these, e.g., histogram() actually call package mosaic wrappers for the corresponding lattice function). Of course some vignettes also cover functions from tigerstats itself.

Wrapper-functions in tigerstats always have names distinct from the function around which they wrap. For example, the tigerstats function for t-procedures is ttestGC(), not t.test(). Hence no function from tigerstats masks any other function from stats, lattice, or mosaic. Thus if you call, for example, t.test() when tigerstats is attached, you will get mosaic’s wrapper-function for the standard t-procedure function t.test() in R’s package stats.

As in package mosaic, an attempt is made to provide a fairly uniform interface for all major functions:

  • When one is examining a single factor or numeric variable var from a data frame myData, the general format is:

\[function(\sim var, data=myData,\ldots)\]

  • When studying the relationship between a numerical variable num and a factor variable fac, the format is:

\[function(num \sim fac, data=myData,\ldots)\]

  • When studying the relationship between two factor variables fac1 and fac2, the format is:

\[function(\sim fac1 + fac2, data=myData,\ldots)\]

This last format is in analogy with the formula interface used by R’s xtabs() function, e.g.,

xtabs(~sex+seat,data=m111survey)
##         seat
## sex      1_front 2_middle 3_back
##   female      19       16      5
##   male         8       16      7
  • When studying the relationship between two numerical variables y and x, the format is:

\[function(y \sim x, data=myData,\ldots)\]

In any of the above cases, if the data argument is not supplied or if a variable is not found in data, then it will be searched for on a path that begins with the environment in which the function was called (typically the Global Environment, if the function is being used interactively). In particular, not every variable in the formula has to be present in the data frame that is typically supplied to data. This mimics the behavior of R’s stats functions.

Getting Help

Students meet the functions first in class or while reading Course Notes or studying slides associated with the course. As they become more familiar with the functions they are encouraged to used help features to refresh their memory quickly. As is typical with Help pages in an R package, the help for functions is often more helpful to an instructor or a developer than to a student. On the other hand, the examples for the major functions are intended to be fairly comprehensive, so calling example() for a particular tigerstats function is likely to be useful.

Students and instructors who desire a more extended refresher or a tutorial that covers most of the common uses of a given function should consult the vignette for that function. If you know the name of the function you are interested in, then you can access the vignette quickly using helpGC(), thus:

helpGC(bwplot)

or

helpGC(ttestGC)

Applets

tigerstats includes a number of instructional applets.

Shiny Apps

Some of the applets are Shiny apps:

  • CentralLiimit: an exploration of the Central Limit Theorem.
  • CIMean: exploration of the coverage properties of t-interval for one population mean , when the standard deviation of the population is unknown.
  • CoinFlip: an exploration of inference in a binomial situation that can be used to introduce hypothesis testing early in a course.
  • FindRegLine: yet another game in which one tries to approximate the regression line. Score is kept.
  • RandomExpBinom: an exploration of a randomized experiments with two treatment groups, in which the response is Bernoulli. This can be used to explore inference as soon as randomized experimental designs are introduced in class.
  • SamplingMethods: This allows students to distinguish visually between simple random sampling, stratified sampling, and cluster sampling.
  • ShallowReg: Sometimes when students see a scatter plot that is based on a random sample from a bi-variate normal distribution, they comment that the regression line for the data looks “too shallow.” This app attempts to explain why the regression line is the preferred line for predicting \(y\)-values from \(x\)-values, even though the “SD line” (the line passing through \((\bar{x},\bar{y})\) and having slope \(s_y/s_x\)) sometimes appears to be the line that best summarizes the scatter plot.
  • SlowGoodness: This app introduces students to the chi-square test for goodness of fit.
  • Type12Errors: Exploration of Type-I and Type-II errors, in the context of inference for one population mean when the standard deviation is unknown.

Any of the above apps may be run locally, using the shiny package that is installed along with tigerstats. For example, to run SlowGoodness simply type:

shiny::runApp(system.file("SlowGoodness",package="tigerstats"))

In classroom settings one is often using the server version of the R Studio IDE. In this case, depending on how permissions are set by the sysadmin, it may not be possible to run Shiny apps locally. In that case you should install the apps on a Shiny server for remote access. Obtain the source code for the apps from the inst directory of the tigerstats package on Git Hub:

https://github.com/homerhanumat/tigerstats.

Manipulate Apps

Some other apps make use of the manipulate package that comes with R Studio, and can only be run in this environment. Some of the apps most likely to be used are:

  • BinomNorm() and BinomSkew() for exploring the binomial family of distributions;
  • CIMean() is a manipulate alternative to the Shiny app of the same name
  • CIProp() explores coverage properties for confidence intervals for a single proportion
  • DtrellHist() and DtrellScat() provide a dynamic form of trellis-graphics, for exploration of data conditioned upon the values of a external variable.
  • EmpRule() helps students understand when the 68-95 Rule can be trusted, and when it cannot.
  • EmpRuleGC() is a graphical calculator for the 68-95 Rule. Some students find it amusing to be able to use it.
  • FindRegLine() is the poor man’s version of the Shiny app of the same name.
  • MeanSampler() is a poor man’s version of the Shiny app CIMean.
  • SlowGoodness() is the poor man’s version of the Shiny app of the same name.
  • Type12Errors() is the poor man’s version of the Shiny app of the same name.

Other Apps

The function ChisqSimSlow() is used to introduce the chi-square test for association between two factor variables. It does not require package manipulate.

General Reference

mosaic and tigerstats are used to teach elementary statistics at Georgetown College. The elementary course at Georgetown serves a dual purpose, as a recommended option for fulfilling the general education mathematics requirement and as a service course for several majors. Many of the course materials are online at:

http://statistics.georgetowncollege.edu

The reader may wish to consult this website, especially the Course Notes, to see how mosaic and tigerstats support the selection and arrangement of topics that are particular to this course.