6.3 Example: Chance of a Triangle

In the examples considered so far, intuition or a simple calculation suggests what the exact value of the probability or expected value should be. The true power of the Monte Carlo simulation method shines forth in situations where it is difficult or impossible to compute a value mathematically.

Let’s revisit an example from Section 4.2.4, where we considered whether or not three segments could be put together to form a triangle. We developed the following function to determine, from the lengths of each segment, whether or not a triangle is possible:

isTriangle <- function(x, y, z) {
  (x + y > z) & (x +z > y) & (y + z > x)
}

Recall also that this function applied to vectors of any length, so we can use it to decide about many triples of segments at once.

Let’s now inject some chance variation into the situation. Suppose that you have a stick that is one unit in length: one foot, one meter, one yard—whatever. You plan to break the stick at two randomly selected points. Breaking the stick will yield three pieces. You wonder: what is the chance that these three pieces can form a triangle?

In an advanced probability course you can show that the probability of a triangle is exactly \(1/4\). Here we would like to estimate the probability with Monte Carlo simulation.

In order to carry out the simulation, we will need to develop code that produces a pair of break-points, any number of times that we like—let’s think about ten times, to start.

This is not so difficult if we use the runif() function:

x <- runif(10)  # the first breaks
y <- runif(10)  # the second breaks

Next we must compute—for each pair of breaks—the lengths of the three segments that are produced. If we knew that the x break was always less than the y break, this would be easy:

the leftmost piece would be x units long;
the middle piece would be y minus x units long;
the rightmost piece would be 1 minus y units long.

The problem is that a given element of the vector y can easily be less than than corresponding element of the vector x. When that happens, the pieces won’t be as described in the bullet-ed items above.

We can solve this problem with the pmin() and pmax() functions that were introduced in Section 2.7.1.5:

a <- pmin(x, y)
b <- pmax(x, y)
side1 <- a     # leftmost
side2 <- b - a  # middle
side3 <- 1 - b # rigttmost

It is useful to write a helper-function that starts from the breaks, computes the sides, and then decides whether they form a triangle:

makesTriangle <- function(x, y) {
  a <- pmin(x, y)
  b <- pmax(x, y)
  side1 <- a
  side2 <- b - a
  side3 <- 1 - b
  isTriangle(x = side1, y = side2, z = side3)
}

Let’s try it our for our repetitions

Table 6.1 shows the results. Based on our ten repetitions, we would estimate the probability of a triangle as \(4/10\).

Table 6.1: Results of ten repetitions of breaking a unit length at two random points to form three segments.
x	y	side1	side2	side3	triangle
0.6469028	0.7644140	0.6469028	0.1175112	0.2355860	FALSE
0.3942258	0.7438358	0.3942258	0.3496100	0.2561642	TRUE
0.6185018	0.8261657	0.6185018	0.2076639	0.1738343	FALSE
0.4768911	0.4227291	0.4227291	0.0541621	0.5231089	FALSE
0.1360972	0.4092877	0.1360972	0.2731905	0.5907123	FALSE
0.0673844	0.5396926	0.0673844	0.4723082	0.4603074	TRUE
0.1291526	0.9607224	0.1291526	0.8315698	0.0392776	FALSE
0.3931179	0.6535573	0.3931179	0.2604394	0.3464427	TRUE
0.0025827	0.5467153	0.0025827	0.5441326	0.4532847	FALSE
0.6202060	0.2660636	0.2660636	0.3541424	0.3797940	TRUE

Interestingly, R can compute the proportion of times that a triangle was formed, directly from the logical vector triangle. Look at the following code:

sum(triangle)

## [1] 4

mean(triangle)

## [1] 0.4

When we ask R to add the elements of a logical vector, it coerces the Boolean values to numbers: TRUE becomes 1 and FALSE becomes 0. Summing the numbers is then equivalent to counting how many times TRUE appeared in triangle. Taking the mean is equivalent to computing the proportion of TRUEs in triangle.

We would like, of course, to simulate breaking the unit length many times, and we would like to be able to choose easily—without a lot of fuss in coding—the number of repetitions. It makes sense, therefore, to write a simulation function that will do our work for us:

triangleSim <- function(reps = 10000 ) {
  cut1 <- runif(reps)
  cut2 <- runif(reps)
  triangle <- makesTriangle(cut1, cut2)
  cat("The proportion of triangles was ", mean(triangle), ".\n", sep = "")
}

Now that we think about it, it might also be nice to have the option to view a table of the results, along with the estimate of the probability:

triangleSim <- function(reps = 10000, table = FALSE ) {
  cut1 <- runif(reps)
  cut2 <- runif(reps)
  triangle <- makesTriangle(cut1, cut2)
  if ( table ) {
    cat("Here is a table of the results:\n")
    print(table(triangle))
    cat("\n")
  }
  cat("The proportion of triangles was ", mean(triangle), ".\n", sep = "")
}

Let’s give our function a try, using the default of ten thousand repetitions and asking for a table of results:

triangleSim(table = TRUE)

## Here is a table of the results:
## triangle
## FALSE  TRUE 
##  7466  2534 
## 
## The proportion of triangles was 0.2534.

As we would expect, our estimate of the probability of a triangle is pretty close to \(1/4\), the value that is known to experts in probability.

6.3.1 Setting a Seed

There are further possibilities for refining the simulation function. We know that it can be a good idea to set a seed for R’s random-number generator. When you do this you still get random-looking results, but they will be the same results no matter how often you call the simulation function. In that way others who have access to your function can “reproduce” the results you got, thus assuring themselves that you weren’t making anything up. Let’s add a seed parameter to our simulation function:

triangleSim <- function(reps = 10000, table = FALSE, seed ) {
  set.seed(seed)
  cut1 <- runif(reps)
  cut2 <- runif(reps)
  triangle <- makesTriangle(cut1, cut2)
  if ( table ) {
    cat("Here is a table of the results:\n\n")
    print(table(triangle))
    cat("\n")
  }
  cat("The proportion of triangles was ", mean(triangle), ".\n", sep = "")
}

This is all very well, but if for some reason you desire to have the function produce different results with every call, you have to come up with the seed yourself. It would be nice to have the option to call your function without having to provide a seed.

This is not difficult to accomplish. We’ll set the default value for seed to be NULL. Then when the function begins we’ll check the value of seed. If it is not NULL, then the user has provided a seed, and we’ll use that. Otherwise, we’ll let R grab pseudo-random numbers however it likes.

triangleSim <- function(reps = 10000, table = FALSE, seed = NULL) {
  if ( !is.null(seed) ) {
    set.seed(seed)
  }
  cut1 <- runif(reps)
  cut2 <- runif(reps)
  triangle <- makesTriangle(cut1, cut2)
  if ( table ) {
    cat("Here is a table of the results:\n\n")
    print(table(triangle))
    cat("\n")
  }
  cat("The proportion of triangles was ", mean(triangle), ".\n", sep = "")
}

6.3.2 Program Development

Think about how we developed the triangleSim() function.

We began by writing simple code that got us a simulation for a few repetitions.
When we were sure how to make a simulation work, we encapsulated our code into a function that we could call to perform the simulation for us.
We though about features that we would like the function to have—user chooses number of repetitions, user chooses whether to see a table, etc.—and implemented these features one by one. This made the function more general, i.e., useful in a wider range of settings.

The above method of program development is called encapsulation and generalization. For small projects of the sort that we have on hand, it’s a good way to go about writing a computer program.

6.3.3 Number of Repetitions

The Law of Large Numbers says that the more times you repeat the simulation, the better our estimate of the desired probability—or expected value—is liable to be. Let’s use the Stick-Splitting Problem to investigate the effect of the choice of repetitions on the accuracy of our estimate of the probability of getting a triangle.

We will need to run the triangleSim() function several times, with a different choice for reps each time, keeping track of the estimates we obtain. In the process we won’t need tables or other output to the console. Let’s rewrite triangleSim() for maximum flexibility, making console output optional and using the invisible return discussed in Section 4.2.3.¹⁷

triangleSim <- function(reps = 10000, table = FALSE, seed = NULL,
                        verbose = TRUE) {
  if ( !is.null(seed) ) {
    set.seed(seed)
  }
  cut1 <- runif(reps)
  cut2 <- runif(reps)
  triangle <- makesTriangle(cut1, cut2)
  if ( table ) {
    cat("Here is a table of the results:\n\n")
    print(table(triangle))
    cat("\n")
  }
  if ( verbose ) {
    cat("The proportion of triangles was ", mean(triangle), ".\n", sep = "")
  }
  invisible(mean(triangle))
}

Next, we make a vector of repetitions, and a vector to hold our estimates:

reps <- c(100, 1000, 10000, 100000, 1000000)
estimates <- numeric(length(reps))

Next we loop through reps, performing a complete simulation for each repetition-number and storing our results:

for ( i in 1:length(reps)) {
  estimates[i] <- triangleSim(reps = reps[i], seed = 3030,
                             verbose = FALSE)
}

Let’s have a look at our estimates:

names(estimates) <- reps
estimates

##      100     1000    10000    1e+05    1e+06 
## 0.200000 0.245000 0.254700 0.251570 0.249779

The estimates are indeed getting closer to the theoretical value of \(1/4\).¹⁸

6.3.4 Practice Exercises

Rewrite the diceGameSim() function from the previous section so that, just as we did with the triangleSim() function in this section:
- it allows the user to set a seed;
- it returns the estimated expected value invisibly, and has a verbose option to cat the estimate out to the Console.
Rewrite the photonTorpedoSim() function from the previous section so that, just as we did with the triangleSim() function in this section:
- it allows the user to set a seed;
- it returns the estimated expected value invisibly, and has a verbose option to cat the estimate out to the Console.
Extend the photonTorpedoSim() just a bit more, by adding two new parameters:
- shield: the level at which the Enterpise’s shields are set. (If the strength of the torpedo is less than shield, then there is zero damage.) The default value of shield should be 5.
- upper: the upper bound on the strenght of a torpedo-shot. (That is, a tropedo shot will be a random real number from 0 to upper.) The default value of upper should be 10.
The program should stop with a helpful message if the user sets upper or shield to a negative number.`
Abe and Bo engage in a duel. They take turns shooting at each other until one person manages to hit the other, thus winning the dual. Abe has a 40 percent chance of hitting with each shot he takes, whereas Bo has a 60 percent chance of hitting on each of his shots. Since Abe is the weaker shooter, Bo generously permits Abe to go first. We would like to estimate Abe’s chance of winning the duel. Write a function called dual() to estimate this probability. The function should have two parameters:
- reps: the number of times to simulate a dual. The default should be 10,000.
- seed: a seed that the user can set. The default value should be NULL.

6.3.5 Solutions to Practice Exercises

Here’s the code:

diceGameSim <- function(reps, seed = NULL, verbose = TRUE) {
  if (!is.null(seed) ) {
    set.seed(seed)
  }
  w <- c(-1, 0, 3)
  pw <- c(2/6, 3/6, 1/6)
  winnings <- sample(w, size = reps, 
                     replace = TRUE,
                     prob = pw)
  if ( verbose ) {
    cat("The expected amount to win is about ",
        mean(winnings), ".\n", sep = "")
  }
  invisible(mean(winnings))
}

Let’s try it out:

diceGameSim(reps = 100000, seed = 5252)

## The expected amount to win is about 0.17503.

Here’s the code, if you use the pmax() function:

photonTorpedoSim <- function(reps, seed = NULL, verbose = TRUE) {
  if (!is.null(seed) ) {
    set.seed(seed)
  }
  strengths <- runif(reps, min = 0, max = 12)
  damage <- pmax(0, strengths - 5)
  if ( verbose ) {
    cat("The expected damage is about ",
        mean(damage), " units.\n", sep = "")
  }
  invisible(mean(damage))
}

Let’s try it out:

photonTorpedoSim(reps = 100000, seed = 4040)

## The expected damage is about 2.034732 units.

Here’s the code:

photonTorpedoSim <- function(reps, seed = NULL, verbose = TRUE,
                             shield = 5, upper = 10) {
  if (!is.null(seed) ) {
    set.seed(seed)
  }
  if ( upper < 0 ) {
    return(cat("Upper bound on torpedo strength cannot be a negative number!"))
  }
  if ( shield < 0 ) {
    return(cat("Shield strength cannot be set to a negative number!"))
  }
  strengths <- runif(reps, min = 0, max = upper)
  damage <- pmax(0, strengths - shield)
  if ( verbose ) {
    cat("The expected damage is about ",
        mean(damage), " units.\n", sep = "")
  }
  invisible(mean(damage))
}

Let’s try it out:

photonTorpedoSim(reps = 100000, seed = 4040,
                 upper = 15, shield = 12)

## The expected damage is about 0.2991273 units.

Here’s the function:

dual <- function(reps = 10000, seed = NULL) {
  if (!is.null(seed)) {
    set.seed(seed)
  }
  # set up logical vector to store results of each simulation:
  abeWins <- logical(reps)
  for (i in 1:reps) {
    dualing <- TRUE
    shotNumber <- 1
    while (dualing) {
      # if i is an odd number, then Abe is shooting:
      abeToShoot <- shotNumber %% 2 == 1
      # probability of hitting depends on who is shooting:
      hitProb <- ifelse(abeToShoot, 0.4, 0.6)
      # shooter takes a shot:
      resultOfShot <- sample(c("hit", "miss"),
                             size = 1, 
                             prob = c(hitProb, 1 - hitProb))
      if (resultOfShot == "hit") {
        abeWins[i] <- ifelse(abeToShoot, TRUE, FALSE)
        dualing <- FALSE
      }
      shotNumber <- shotNumber + 1
    }
  }
  # count up how many times abe won, divide by number of plays,
  # and return:
  sum(abeWins)/reps
}

Let’s try it out:

dual(reps = 100000, seed = 3939)

## [1] 0.52441

Looks like Abe has a slight advantage!

The addition of the invisible return may be thought of as yet another step in the encapsulation-and-generalization process of program development!↩︎
Statistical theory tells us that when we use Monte Carlo simulation to estimate a probability or an expected value, then the likely size of the error in the approximation is inversely proportional to the square root of the number of repetitions we use. From this it follows that to cut the likely size of the error in half you should increase the number of repetitions by a factor of 4. If you want the likely size to be one-third as big, increase repetitions by a factor of 9, and so on.↩︎