13.2 Reading Text Files with readLines()

Let’s transfer the contents of words.txt to a character vector in R. The function for this is readLines():

words <- readLines(con = "downloads/words.txt")

readLines() is similar to readline(), which we have used in the past to get a single line of input from the user. readLines() can also read from the console—this is called standard input—but it can also make a connection33 with any file. In the code above, we set the connection to be the file words.txt. By default readLines() reads the file—one line at a time as a string. The resulting character vector is named words.34

Let’s look at words to see if it came out the way we expected. The head() function works not only on data frames but also on vectors:

## [1] "aa"     "aah"    "aahed"  "aahing" "aahs"   "aal"

Yes, it looks like we got one word into each element of the vector. The total number of words is:

## [1] 113809

  1. A connection, also known to R-programmers as a “generalized file”, is any one of a wide range of interfaces that can be established with the world outside of the R-session for the purpose of sending or receiving data. One can make a connection with files stored locally on one’s computer, with URL addresses on the internet, with databases, and so on.

  2. Note that the stringr package contains a vector named words that is used for practicing with regular expressions. This vector is now masked by the words vector you have created, but you can still access it as stringr::words.