0.1 The Why of These Notes: Remarks for Colleagues
There is a plethora of books on R, covering pretty much every domain of application of the language, from ecology to spatial statistics to machine learning and data science. There are even some books—among the very finest of R-books, in my view—on R as a programming language.
On the other hand R is designed for one-line interactivity at the console, so it’s possible for a beginner to get simple programs working quickly. The R-ecosystem has also become a lot more user-friendly in recent years. The RStudio IDE is comparable to top-flight integrated development environments for many other major languages and yet is still relatively lightweight and accessible to beginners. The Server version of R Studio is especially useful for new programmers, as it saves them from having to deal with installation and other IT issues on their own machines, permitting them to focus on coding. It’s also quite convenient, in a server setting, to make class materials available and to collect and return assignments. R Markdown is fine platform for producing course notes (this book is written in R Markdown with the excellent bookdown package (Xie 2020)) and slides as well. Students, too, can use R Markdown to both write and discuss their programs in a single document. The blogdown package (Xie, Dervieux, and Presmanes Hill 2021) permits students to begin writing for the public about technical programming issues—or about anything at all, really, as more than a few of them are taking majors in the Humanities—thus building up a professional resume of online work. When it’s time to learn about databases, students can leverage a body of recent work (see Databases Using R) that renders the R Studio environment nearly as friendly for interaction with databases as dedicated tools such as MySQL Workbench. Finally, the shiny package (Chang et al. 2020) permits students to build simple interactive web apps for data analysis that can be used by non-coders. Both blogdown and shiny prompt students to consider early on—even in the first year, if the pacing is right—concepts of web design, the other focus of the minor.
Hence the choice was made to teach a first-year computer science course, to beginning programmers, with R. As I pointed out earlier, there do exist some excellent books on R as a programming language that do not presume previous experience with R. One example is Norman Matloff’s The Art of R Programming (Matloff 2011). Matloff, however, presumes that the reader either has prior programming experience in some other language or else possesses sufficient computational maturity, acquired perhaps through extensive prior training in the mathematical sciences. Another great text is Garrett Grolemund’s Hands-on Programming with R (Grolemund 2014). Grolemund’s book is lively and to-the-point, and starts off with excellent motivating examples. Grolemund is also a master explainer, and he has put considerable effort into visual representation of programming concepts such as element-wise operations on vectors and the enclosure-relationships between environments. On the other hand, even though he doesn’t assume that the reader has prior coding experience, Grolemund does assume some prior background in data analysis and a strong motivation, on the reader’s part, to persevere with nontrivial R-programming issues such as lexical scoping in the hopes of eventual payoffs in programming for data science. In short, Grolemund also assumes more computational maturity than will be usually be found among beginning programmers at many small liberal arts colleges.
Hence the niche for the Notes offered here. I aim to be more copious and slower-paced than Grolemund and less sophisticated than Matloff. These notes will also contain a more extensive set of problems, ranging in difficulty from practice exercises to fairly extended projects that students might write up in R Markdown documents.
Experienced programmers and R enthusiasts will be struck by the absence of certain topics. Programmers will observe that there is no real attention to algorithms (sorting is just
order()), and although functions receive lots of attention there is no mention of recursion. In future editions I might cover recursion, as I believe that it is wonderful for the development of thinking skills, but it’s not likely that a web developer or data analyst would have the need to write a recursive function. Time spent on recursion and on various efficient algorithms for sorting and searching may be better spent, in my view, on extended programming projects, Shiny apps and blogging, and the introduction of programmer’s trade-tools such as version control and GitHub. I hope by the end of the first year to have made time for all of these out-of-book topics.
The Notes give more attention to base R functions than other introductory texts directed to data analysts, but we do introduce elements of the tidyverse as appropriate. The pipe operator is introduced in connection with data frames, ggplot2 and graphing are treated in some detail, string operations and regular expressions are managed primarily with stringr, and the approach to higher-order functions is through purrr. Full treatment of the data wrangling is deferred, however, to later courses.
The first-semester course is required for mathematics and physics majors and for students in our pre-engineering program, so a central application of the early material is simulation of random processes. I believe that this makes the Notes relevant for students in other disciplines—e.g., biology and finance—in a way that complements their use of R for data analysis.
Two of the most fundamental topics in any comprehensive discussion of the R language—lexical scoping and computing on the language—are absent from this book. Lexical scoping and its implications are mentioned only in a brief footnote. Partly this is due the fact that most of the elementary applications of lexical scoping mentioned in the literature are related to scientific computing, which won’t be a concern for most of my students. Certainly lexical scoping is important for understanding how R-packages work, but elementary students don’t author packages. As for computing on the language it is true that users are affected by it all the time (e.g., whenever they use functions with a formula interface), but generally one need not perform any computation on the language until one begins writing formula-interface functions for the benefit of casual R-users.
- they are useful in data analysis;
- I have not found a treatment of regular expressions in R that a person without significant prior exposure to them in other languages has a prayer of following;
- and because if you master regular expressions then you feel like a wizard.
As for the numerous Wizard of Oz-themed examples, I can offer no defense other than haste in composition and the fact that the Wizard of Oz is now in the public domain.