## 6.6 Example: How Many Numbers Needed?

Let us pause for a review.

So far our examples have involved estimating a probability: the chance for a particular event to occur. We have accomplished this with Monte Carlo simulation:

• We repeated the chance process a very large number of times
• On each repetition, we recorded whether or not the event occurred.
• After we finished all of the repetitions, we computed the proportion of those repetitions in which the event did occur.
• This proportion served as our estimate of the probability for the event to occur.
• We know that the more repetitions we arrange for, the closer our estimate is liable to be to the actual probability for the event to occur.

Let us now switch gears and take up the task of estimating the expected value of a random variable.

Recall that a random variable is a number whose value is the result of some chance process. The expected value of the random variable is the average—the mean—of its values over many, many repetitions of that chance process.

Let us consider the following chance process:

• Pick a real number $$x_1$$ between 0 and 1 at random.
• Pick another real number $$x_2$$ at random (also between 0 and 1). Add it to $$x_1$$.
• If the sum $$x_1 + x_2$$ exceeds 1, then stop.
• Otherwise, pick a third real number $$x_3$$, and add it to the sum of the previous two.
• If the sum $$x_1+x_2+x_3$$ exceeds 1, then stop.
• Otherwise, pick a fourth real number …

The idea is to keep on going in this way until the sum of the numbers you have picked exceeds 1.

The random variable we are interested in is the number of numbers we have to pick to make the sum of those numbers exceed 1. Let us call this random variable $$X$$. As you can see, this number could be 2, or 3, or 4, or even more depending on how small the random numbers you pick happen to be.

In order to fully grasp what is going on, it will be helpful to go through the process of computing $$X$$. The following function will do this for us:

numberNeeded <- function(verbose = FALSE) {
mySum <- 0
count <- 0
while( mySum < 1 ) {
number <- runif(1)
mySum <- mySum + number
count <- count + 1
if ( verbose ) {
cat("Just now picked ", number, ".\n", sep = "")
cat(count, " number(s) picked so far.\n", sep = "")
cat("Their sum is ", mySum, ".\n\n", sep = "")
}
}
count
}

Call the function a few times, like this:

numberNeeded(verbose = TRUE)
## Just now picked 0.2869393.
## 1 number(s) picked so far.
## Their sum is 0.2869393.
##
## Just now picked 0.7092682.
## 2 number(s) picked so far.
## Their sum is 0.9962075.
##
## Just now picked 0.7524099.
## 3 number(s) picked so far.
## Their sum is 1.748617.
## [1] 3

As you can see, it’s quite common to need 2 or 3 numbers to make the sum exceed 1, but sometimes, you need 4, 5 or even more numbers.

Since we are interested in the expected value of $$X$$, we should repeat the process a large number of times and compute the mean of the $$X$$-values that we find. The following code repeats the process 1000 times:

needed <- numeric(1000)
for (i in 1:1000 ) {
needed[i] <- numberNeeded()
}
cat("The expected number needed is about ",
mean(needed), ".\n", sep = "")
## The expected number needed is about 2.697.

Of course the expected value indicates what $$X$$ will be on average. By itself it doesn’t tell us about the variability in $$X$$—how much $$X$$ tends to “hop around” from one repetition to another. In order to get a sense of the variability of $$X$$, we can make a table of the results of our 1000 repetitions:

table(needed)
## needed
##   2   3   4   5   6   7
## 528 303 123  37   8   1

Sure enough, we usually got 2 or 3, but higher values are certainly possible.

For the most part we are less interested in the number of times we got each possible value than we are in the proportion of times we got each possible value. For this all we need is prop.table():

prop.table(table(needed))
## needed
##     2     3     4     5     6     7
## 0.528 0.303 0.123 0.037 0.008 0.001

The proportions serve as estimates of the probability of getting each of the possible values of $$X$$. Such a table of proportions gives us a sense of the distribution of $$X$$—the chances for $$X$$ to assume each of its possible values.20

Now that we have the basic idea of how to accomplish the simulation, it remains to package up our code into a nice function that will enable a user to set the number of repetitions, set a seed, and choose reporting options. Along the way we will generalize a bit, allowing the user to set a target sum other than 1. We’ll also return our estimate of the expected number needed invisibly, so that the function can be used in the context of larger-scale investigations (see the exercises from this Chapter).

numberNeededSim <- function(target = 1, reps = 1000,
seed = NULL, report = FALSE) {

if ( !is.null(seed) ) {
set.seed(seed)
}

# define helper function
numberNeeded <- function(target) {
mySum <- 0
count <- 0
while( mySum < target ) {
number <- runif(1)
mySum <- mySum + number
count <- count + 1
}
count
}

#perform simulation
needed <- numeric(reps)
for (i in 1:reps ) {
needed[i] <- numberNeeded(target)
}

# report results (if desired)
if ( report ) {
print(prop.table(table(needed)))
cat("\n")
cat("The expected number needed is about ",
mean(needed), ".\n", sep = "")
}
# return estimate of the expected number needed:
invisible(mean(needed))
}

Let’s test the function with ten thousand repetitions:

numberNeededSim(target = 1, reps = 10000, seed = 4848, report = TRUE)
## needed
##      2      3      4      5      6      7      8
## 0.4937 0.3350 0.1313 0.0317 0.0071 0.0010 0.0002
##
## The expected number needed is about 2.7273.

Does the estimate of the expected number of numbers look a bit familiar? Mathematicians have proven that the exact expected value—what you would get for an average if you could do more and more repetitions, without limit—is the famous number $$e$$:

$\mathrm{e} \approx 2.71828 \ldots$ In the exercises you will be asked to investigate the expected value when the target is a number other than 1.

1. Of course not all of the possible values for $$X$$ appear on the table: the only values appearing are those that happened to occur in our simulation. The values that did not occur probably have a very small (but nonzero) chance of occurring.↩︎