Exercises

  1. Write a function called findMister() that, when given any string, will return a character vector of the words that immediately follow the string “Mister”, with exactly one space in between. The function should take a single argument called str, the string to search. A typical example of use is as follows:

    text <- "Here are Mister Tom, MisterJerry, Mister Mister, and Mister\tJoe."
    findMister(text)
    ## [1] "Tom"    "Mister"
  2. Write a function called findMr() that, when given any string, will return a character vector of all words following the string “Mr.”, with exactly one space in between. The function should take a single argument called str, the string to search. A typical example of use is as follows:

    text <- "Here are Mr. Tom, Mr Jerry, Mr. Mister, and Mr.\tJoe."
    findMr(text)
    ## [1] "Tom"    "Mister"
  3. For each of the following expressions, write a regular expression to test whether any of the sub-string(s) described occur in a given string. The regular expression should match any of the sub-strings described, and should not match any other sub-string. Try to make the regular expression as short as possible. Write the regular expression as a string that could be used in one of R’s regex functions (i.e. extra backslash escapes as needed). The first item is done for you, as an example.
    • bot and bat. Regex string: "b[oa]t". (This is the one to submit, because it’s shorter than other alternatives such as"box|bat").
    • cart and cars and carp.
    • slick and sick
    • Any word ending in ity (such as velocity and ferocity). Be sure to pay attention to word-boundaries. You should match velocity but not  velocity (includes a space before the “v”) or velocity;.
    • A whole number consisting of more than six digits.
    • A word that is between 3 and 6 characters long. Pay attention to word-boundaries.
    • One or more white-space characters, followed by a hyphen or a semicolon or a colon.
  4. Write a function called findTitled() that, when given any string, will return a character vector of all words following any one of these titles:
    • “Mr.”
    • “Mister”
    • “Missus”
    • “Mrs.”
    • “Miss”
    • “Ms.”

    There should be exactly one space between the title and the following word. The function should take a single argument called str, the string to search. A typical example of use is as follows:

    text <- "Here are Mr. Tom, Ms. Thatcher, Miss Ellen, and Helen."
    findTitled(text)
    ## [1] "Tom"      "Thatcher" "Ellen"
  5. Write a function called capRepeats() that, when given a string, searches for all repeated-word pairs (with at least one character of white-space in between) and replaces them with the same pair where all letters are capitalized. The function should take a single argument called str, the string to be searched. A typical example of use would be as follows:

    capRepeats("I have a boo boo on my knee    \tknee!")
    ## [1] "I have a BOO BOO on my KNEE    \tKNEE!"
  6. Use str_subset() to write a function called longWord() that, when given a character vector of strings, returns a vector consisting of the strings that contain a word at least eight characters long. The function should take a single argument called strs. An example of use would be:

    myText <- c("Very short words.", "Got a gargantuan word.", "More short words!")
    longWord(strs = myText)
    ## [1] "Got a gargantuan word."
  7. Write a function called longWord2() that, when given a character vector of strings, returns a list of character vectors, where each vector consists of the words in the corresponding string that are at least eight characters long. The function should take a single argument called strs. An example of use would be:

    myText <- c("Very short words.", "Got a gargantuan word.", "More short words!")
    longWord2(strs = myText)
    ## [[1]]
    ## character(0)
    ## 
    ## [[2]]
    ## [1] "gargantuan"
    ## 
    ## [[3]]
    ## character(0)
  8. Write a function called phoneNumber() that, when given a vector of strings returns a logical vector indicating which of the strings contain a valid phone number. For our purposes a valid phone number shall be any string of the form

    xxx-xxx-xxxx

    or

    xxx.xxx.xxx

    Thus, 502-863-8111 is valid and so is 502.863.8111, but not 502-863.8111.

    In the code for the function, specify the pattern using (?x) so you can ignore whitespace and leave detailed comments for each portion of the regular expression.

    The function should take a single parameter called strs. A typical example of use would be:

    sentences <- c("Ted's number is 606-255-3143.",
                   "Rhonda's number is 403-28-1259.",
                   "Lydia's number is 502.255.3921.",
                   "Raj's number is 502.367-4432.")
    phoneNumber(strs = sentences)
    ## [1]  TRUE FALSE  TRUE FALSE