6.2 Monte Carlo Simulation

Our definitions of probability and expected value both involved a limiting notion, namely: what would happen if you could somehow repeat the random process more and more times, without a bound on the number of repetitions. Accordingly, even if we find that we are unable to compute a probability or an expected value exactly with mathematics, we can still attempt to estimate it by making the computer repeat the random experiment many times, keeping track of the result of the experiment each time. This technique is known as Monte Carlo simulation, after the famous Monte Carlo casino in the Principality of Monaco.

In this section we will employ Monte Carlo simulation to estimate probability and expected value in a couple of simple examples.

6.2.1 Estimating a Probability

Consider a box that holds ten tickets. Three of them are labeled with the letter “a”; the rest are labeled “b”:

tickets <- c(rep("a", 3), rep("b", 7))
tickets

##  [1] "a" "a" "a" "b" "b" "b" "b" "b" "b" "b"

We plan to draw one of these tickets at random from the box, with each ticket having the same chance to be the ticket selected. Three of the ten tickets are a’s, so our intuition says that the probability of selecting an “a” ticket is 3 out of 10, or 0.3. Let’s use Monte Carlo simulation to estimate the probability, and see if we get something close to 0.3.

In order to set up the simulation, we need a device for repeating the random process as many times as we would like. At each repetition, the outcome of the chance process should not depend on previous outcomes. We can accomplish this by using R s sample() function on the tickets variable. For example, if we would like to repeat the process of selecting a ticket twenty times, we could write the following code:

sims <- sample(tickets, size = 20, replace = TRUE)
sims

##  [1] "a" "b" "a" "b" "b" "b" "b" "a" "b" "a" "b" "a" "b" "a" "b" "b" "a" "b" "a" "a"

Notice that we set replace to TRUE. This was important, since it guarantees that at each selection there are three “a”’s and seven “b”’s in the box, keeping the chance of getting an “a”-ticket the same each time.

We can summarize the results in a table using R’s table() function:

table(sims)

## sims
##  a  b 
##  9 11

Of course we don’t care so much about the number of times we got an “a”-ticket. Instead we care about the proportion of times we did so. In order to convert the numbers in a table to proportions we can make the table an argument for the function prop.table():

prop.table(table(sims))

## sims
##    a    b 
## 0.45 0.55

Based on twenty repetitions of the random process, our estimate of the chance an “a”-ticket is 0.25. That’s not so close to the true chance of \(1/3\). In order to obtain a more accurate estimate, we should increase the number of repetitions of the random process. Let’s try again, with ten thousand repetitions:

sims <- sample(tickets, size = 10000, replace = TRUE)
prop.table(table(sims))

## sims
##      a      b 
## 0.2952 0.7048

Our estimate is now a lot closer to the actual value of the probability.

6.2.2 Estimating an Expected Value

Imagine that you are about to play the following game: you will flip a fair coin twice.

If you get Tails both times, you lose a dollar.
If you get exactly one Head, nothing happens.
If you get Heads both times, you win two dollars.

Let \(W\) be the number of dollars you will win.

\(W\) is a clearly a random variable: it’s a number whose value—either -1, 0 or 2—depends on the outcome of the random process of flipping the fair coin twice. What is the expected value of \(W\)?

If we think about the probabilities involved then we can come up with a candidate for the expected value. When you flip a fair coin twice, there are four equally likely outcomes:

Tails and then Tails (\(W = -1\))
Tails and then Heads (\(W = 0\))
Heads and then Tails (\(W = 0\))
Heads and then Heads (\(W = 2\))

Hence you have:

The chance that \(W=-1\) is 0.25.
The chance that \(W=0\) is 0.50.
The chance that \(W=2\) is 0.25.

Hence the expected value should be the weighted average:

\[0.25 \times -1 + 0.50 \times 0 + 0.25 \times 2.\] This works out to 0.25, or 25 cents.

We would like to see if Monte Carlo simulation can render an estimate that is close to this value.

For a simple game like ours where there are only a few possible outcomes for the random variable, it is still a good idea to use sample() to simulate the random process of playing the game. We only need to provide a vector of possible values for \(W\) to sample from, and a vector of probabilities for obtaining each of those possible values. The following code illustrates how we might simulate playing the game ten times:

w <- c(-1, 0, 2)
pw <- c(0.25, 0.50, 0.25)
winnings <- sample(w, size = 10, replace = TRUE, prob = pw)
winnings

##  [1] -1  2  0  2  0  0  0  0 -1  2

The Monte Carlo estimate of the expected value of \(W\) is just the average of the winnings in our simulations:

mean(winnings)

## [1] 0.4

Of course, with such a small number of repetitions of the game we cannot expect the Monte Carlo estimate to be very accurate. Let’s try again, but with ten thousand repetitions:

winnings <- sample(w, size = 10000, replace = TRUE, prob = pw)
mean(winnings)

## [1] 0.2435

Yes, that’s much closer to our mathematically-computed value!

By the way, R can easily compute the weighted average itself:

sum(w*pw)

## [1] 0.25

6.2.3 The Law of Large Numbers

In our Monte Carlo simulations so far, we have seen that the more times we repeat the underlying random process, the closer our estimate is likely to be to the actual value, no matter whether we were estimating the probability of an event or an expected value for a random variable.

This is no accident: in fact, it is a consequence of a theorem in the subject of probability that is known as the Law of Large Numbers. We do not possess the mathematical machinery necessary to state the Law precisely—much less prove it—but we can take the rough statement given here as an assurance that the more repetitions we include in our simulation, the more accurate the resulting estimate is liable to be.

6.2.4 Practice Exercises

Consider the following game: you roll a fair die. If you get one or two spots, then you lose a dollar. If you get three, four, or five spots then nothing happens. If you get six spots then you win three dollars. You would like to know how much money you can expect to win per play of the game, on average.

Write a function called diceGameSim() that will help you answer the question. The function should take the following parameter:
- reps: the number of times to repeat the simulation. The default value should be 10000.
The function should return an estimate of the expected value of the game.
You command a Klingon vessel, and you are shooting photon torpedoes at the Starship USS Enterprise. Every torpedo you fire has a strength \(s\) that is a random real number between 0 and 12. If the strength of the torpedo is less than or equal to 5, then the Enterprise’s shields will ward off any damage. However, if the strength \(s\) exceeds 5, then the damage inflicted on the Enterprise will be \(12-s\) units. What is the expected amount of damage done to the Enterprise, per shot? Write a function called photonTorpedoSim() that estimates the expected damage. The function should take the following parameter:
- reps: the number of times to repeat the process of firing at the Enterprise. The default value should be 10000.
The function should return the estimated expected value.

6.2.5 Solutions to the Practice Exercises

Here’s the code:

diceGameSim <- function(reps) {
  w <- c(-1, 0, 3)
  pw <- c(2/6, 3/6, 1/6)
  winnings <- sample(w, size = reps, 
                     replace = TRUE,
                     prob = pw)
  mean(winnings)
}

Let’s try it out:

set.seed(3838)
diceGameSim(reps = 100000)

## [1] 0.16548

Our estimate, based on one hundred thousand repetitions of the game, is that you can expect to win about 16.55 cents per game. (In fact the exact expected value is \(1/6\) of a dollar.)

Here’s the code, if you use the pmax() function:

photonTorpedoSim <- function(reps) {
  strengths <- runif(reps, min = 0, max = 12)
  damage <- pmax(0, strengths - 5)
  mean(damage)
}

Let’s try it out:

set.seed(3838)
photonTorpedoSim(reps = 100000)

## [1] 2.043227

You can expect to do about 2.04 points worth of damage per shot. (In fact the exact expected value is \(49/24 \approx 2.04167\).)

Here’s the code if you used a loop instead:

photonTorpedoSim <- function(reps) {
  damage <- numeric(reps)
  for (i in 1:reps) {
    strength <- runif(1, min = 0, max = 12)
    if (strength > 5) {
      damage[i] <- strength - 5
    } else {
      damage[i] <- 0
    }
  }
  mean(damage)
}