6.3 Example: Chance of a Triangle
In the examples considered so far, intuition or a simple calculation suggests what the exact value of the probability or expected value should be. The true power of the Monte Carlo simulation method shines forth in situations where it is difficult or impossible to compute a value mathematically.
Let’s revisit an example from Section 4.2.4, where we considered whether or not three segments could be put together to form a triangle. We developed the following function to determine, from the lengths of each segment, whether or not a triangle is possible:
<- function(x, y, z) {
isTriangle + y > z) & (x +z > y) & (y + z > x)
(x }
Recall also that this function applied to vectors of any length, so we can use it to decide about many triples of segments at once.
Let’s now inject some chance variation into the situation. Suppose that you have a stick that is one unit in length: one foot, one meter, one yard—whatever. You plan to break the stick at two randomly selected points. Breaking the stick will yield three pieces. You wonder: what is the chance that these three pieces can form a triangle?
In an advanced probability course you can show that the probability of a triangle is exactly \(1/4\). Here we would like to estimate the probability with Monte Carlo simulation.
In order to carry out the simulation, we will need to develop code that produces a pair of break-points, any number of times that we like—let’s think about ten times, to start.
This is not so difficult if we use the runif()
function:
<- runif(10) # the first breaks
x <- runif(10) # the second breaks y
Next we must compute—for each pair of breaks—the lengths of the three segments that are produced. If we knew that the x
break was always less than the y
break, this would be easy:
- the leftmost piece would be
x
units long; - the middle piece would be
y
minusx
units long; - the rightmost piece would be 1 minus
y
units long.
The problem is that a given element of the vector y
can easily be less than than corresponding element of the vector x
. When that happens, the pieces won’t be as described in the bullet-ed items above.
We can solve this problem with the pmin()
and pmax()
functions that were introduced in Section 2.7.1.5:
<- pmin(x, y)
a <- pmax(x, y)
b <- a # leftmost
side1 <- b - a # middle
side2 <- 1 - b # rigttmost side3
It is useful to write a helper-function that starts from the breaks, computes the sides, and then decides whether they form a triangle:
<- function(x, y) {
makesTriangle <- pmin(x, y)
a <- pmax(x, y)
b <- a
side1 <- b - a
side2 <- 1 - b
side3 isTriangle(x = side1, y = side2, z = side3)
}
Let’s try it our for our repetitions
Table 6.1 shows the results. Based on our ten repetitions, we would estimate the probability of a triangle as \(4/10\).
x | y | side1 | side2 | side3 | triangle |
---|---|---|---|---|---|
0.6469028 | 0.7644140 | 0.6469028 | 0.1175112 | 0.2355860 | FALSE |
0.3942258 | 0.7438358 | 0.3942258 | 0.3496100 | 0.2561642 | TRUE |
0.6185018 | 0.8261657 | 0.6185018 | 0.2076639 | 0.1738343 | FALSE |
0.4768911 | 0.4227291 | 0.4227291 | 0.0541621 | 0.5231089 | FALSE |
0.1360972 | 0.4092877 | 0.1360972 | 0.2731905 | 0.5907123 | FALSE |
0.0673844 | 0.5396926 | 0.0673844 | 0.4723082 | 0.4603074 | TRUE |
0.1291526 | 0.9607224 | 0.1291526 | 0.8315698 | 0.0392776 | FALSE |
0.3931179 | 0.6535573 | 0.3931179 | 0.2604394 | 0.3464427 | TRUE |
0.0025827 | 0.5467153 | 0.0025827 | 0.5441326 | 0.4532847 | FALSE |
0.6202060 | 0.2660636 | 0.2660636 | 0.3541424 | 0.3797940 | TRUE |
Interestingly, R can compute the proportion of times that a triangle was formed, directly from the logical vector triangle
. Look at the following code:
sum(triangle)
## [1] 4
mean(triangle)
## [1] 0.4
When we ask R to add the elements of a logical vector, it coerces the Boolean values to numbers: TRUE
becomes 1 and FALSE
becomes 0. Summing the numbers is then equivalent to counting how many times TRUE
appeared in triangle
. Taking the mean is equivalent to computing the proportion of TRUE
s in triangle
.
We would like, of course, to simulate breaking the unit length many times, and we would like to be able to choose easily—without a lot of fuss in coding—the number of repetitions. It makes sense, therefore, to write a simulation function that will do our work for us:
<- function(reps = 10000 ) {
triangleSim <- runif(reps)
cut1 <- runif(reps)
cut2 <- makesTriangle(cut1, cut2)
triangle cat("The proportion of triangles was ", mean(triangle), ".\n", sep = "")
}
Now that we think about it, it might also be nice to have the option to view a table of the results, along with the estimate of the probability:
<- function(reps = 10000, table = FALSE ) {
triangleSim <- runif(reps)
cut1 <- runif(reps)
cut2 <- makesTriangle(cut1, cut2)
triangle if ( table ) {
cat("Here is a table of the results:\n")
print(table(triangle))
cat("\n")
}cat("The proportion of triangles was ", mean(triangle), ".\n", sep = "")
}
Let’s give our function a try, using the default of ten thousand repetitions and asking for a table of results:
triangleSim(table = TRUE)
## Here is a table of the results:
## triangle
## FALSE TRUE
## 7466 2534
##
## The proportion of triangles was 0.2534.
As we would expect, our estimate of the probability of a triangle is pretty close to \(1/4\), the value that is known to experts in probability.
6.3.1 Setting a Seed
There are further possibilities for refining the simulation function. We know that it can be a good idea to set a seed for R’s random-number generator. When you do this you still get random-looking results, but they will be the same results no matter how often you call the simulation function. In that way others who have access to your function can “reproduce” the results you got, thus assuring themselves that you weren’t making anything up. Let’s add a seed
parameter to our simulation function:
<- function(reps = 10000, table = FALSE, seed ) {
triangleSim set.seed(seed)
<- runif(reps)
cut1 <- runif(reps)
cut2 <- makesTriangle(cut1, cut2)
triangle if ( table ) {
cat("Here is a table of the results:\n\n")
print(table(triangle))
cat("\n")
}cat("The proportion of triangles was ", mean(triangle), ".\n", sep = "")
}
This is all very well, but if for some reason you desire to have the function produce different results with every call, you have to come up with the seed yourself. It would be nice to have the option to call your function without having to provide a seed.
This is not difficult to accomplish. We’ll set the default value for seed
to be NULL
. Then when the function begins we’ll check the value of seed
. If it is not NULL
, then the user has provided a seed, and we’ll use that. Otherwise, we’ll let R grab pseudo-random numbers however it likes.
<- function(reps = 10000, table = FALSE, seed = NULL) {
triangleSim if ( !is.null(seed) ) {
set.seed(seed)
}<- runif(reps)
cut1 <- runif(reps)
cut2 <- makesTriangle(cut1, cut2)
triangle if ( table ) {
cat("Here is a table of the results:\n\n")
print(table(triangle))
cat("\n")
}cat("The proportion of triangles was ", mean(triangle), ".\n", sep = "")
}
6.3.2 Program Development
Think about how we developed the triangleSim()
function.
- We began by writing simple code that got us a simulation for a few repetitions.
- When we were sure how to make a simulation work, we encapsulated our code into a function that we could call to perform the simulation for us.
- We though about features that we would like the function to have—user chooses number of repetitions, user chooses whether to see a table, etc.—and implemented these features one by one. This made the function more general, i.e., useful in a wider range of settings.
The above method of program development is called encapsulation and generalization. For small projects of the sort that we have on hand, it’s a good way to go about writing a computer program.
6.3.3 Number of Repetitions
The Law of Large Numbers says that the more times you repeat the simulation, the better our estimate of the desired probability—or expected value—is liable to be. Let’s use the Stick-Splitting Problem to investigate the effect of the choice of repetitions on the accuracy of our estimate of the probability of getting a triangle.
We will need to run the triangleSim()
function several times, with a different choice for reps
each time, keeping track of the estimates we obtain. In the process we won’t need tables or other output to the console. Let’s rewrite triangleSim()
for maximum flexibility, making console output optional and using the invisible return discussed in Section 4.2.3.17
<- function(reps = 10000, table = FALSE, seed = NULL,
triangleSim verbose = TRUE) {
if ( !is.null(seed) ) {
set.seed(seed)
}<- runif(reps)
cut1 <- runif(reps)
cut2 <- makesTriangle(cut1, cut2)
triangle if ( table ) {
cat("Here is a table of the results:\n\n")
print(table(triangle))
cat("\n")
}if ( verbose ) {
cat("The proportion of triangles was ", mean(triangle), ".\n", sep = "")
}invisible(mean(triangle))
}
Next, we make a vector of repetitions, and a vector to hold our estimates:
<- c(100, 1000, 10000, 100000, 1000000)
reps <- numeric(length(reps)) estimates
Next we loop through reps
, performing a complete simulation for each repetition-number and storing our results:
for ( i in 1:length(reps)) {
<- triangleSim(reps = reps[i], seed = 3030,
estimates[i] verbose = FALSE)
}
Let’s have a look at our estimates:
names(estimates) <- reps
estimates
## 100 1000 10000 1e+05 1e+06
## 0.200000 0.245000 0.254700 0.251570 0.249779
The estimates are indeed getting closer to the theoretical value of \(1/4\).18
6.3.4 Practice Exercises
Rewrite the
diceGameSim()
function from the previous section so that, just as we did with thetriangleSim()
function in this section:- it allows the user to set a seed;
- it returns the estimated expected value invisibly, and has a
verbose
option tocat
the estimate out to the Console.
Rewrite the
photonTorpedoSim()
function from the previous section so that, just as we did with thetriangleSim()
function in this section:- it allows the user to set a seed;
- it returns the estimated expected value invisibly, and has a
verbose
option tocat
the estimate out to the Console.
Extend the
photonTorpedoSim()
just a bit more, by adding two new parameters:shield
: the level at which the Enterpise’s shields are set. (If the strength of the torpedo is less thanshield
, then there is zero damage.) The default value ofshield
should be 5.upper
: the upper bound on the strenght of a torpedo-shot. (That is, a tropedo shot will be a random real number from 0 toupper
.) The default value ofupper
should be 10.
The program should stop with a helpful message if the user sets
upper
orshield
to a negative number.`Abe and Bo engage in a duel. They take turns shooting at each other until one person manages to hit the other, thus winning the dual. Abe has a 40 percent chance of hitting with each shot he takes, whereas Bo has a 60 percent chance of hitting on each of his shots. Since Abe is the weaker shooter, Bo generously permits Abe to go first. We would like to estimate Abe’s chance of winning the duel. Write a function called
dual()
to estimate this probability. The function should have two parameters:reps
: the number of times to simulate a dual. The default should be 10,000.seed
: a seed that the user can set. The default value should beNULL
.
6.3.5 Solutions to Practice Exercises
Here’s the code:
<- function(reps, seed = NULL, verbose = TRUE) { diceGameSim if (!is.null(seed) ) { set.seed(seed) }<- c(-1, 0, 3) w <- c(2/6, 3/6, 1/6) pw <- sample(w, size = reps, winnings replace = TRUE, prob = pw) if ( verbose ) { cat("The expected amount to win is about ", mean(winnings), ".\n", sep = "") }invisible(mean(winnings)) }
Let’s try it out:
diceGameSim(reps = 100000, seed = 5252)
## The expected amount to win is about 0.17503.
Here’s the code, if you use the
pmax()
function:<- function(reps, seed = NULL, verbose = TRUE) { photonTorpedoSim if (!is.null(seed) ) { set.seed(seed) }<- runif(reps, min = 0, max = 12) strengths <- pmax(0, strengths - 5) damage if ( verbose ) { cat("The expected damage is about ", mean(damage), " units.\n", sep = "") }invisible(mean(damage)) }
Let’s try it out:
photonTorpedoSim(reps = 100000, seed = 4040)
## The expected damage is about 2.034732 units.
Here’s the code:
<- function(reps, seed = NULL, verbose = TRUE, photonTorpedoSim shield = 5, upper = 10) { if (!is.null(seed) ) { set.seed(seed) }if ( upper < 0 ) { return(cat("Upper bound on torpedo strength cannot be a negative number!")) }if ( shield < 0 ) { return(cat("Shield strength cannot be set to a negative number!")) }<- runif(reps, min = 0, max = upper) strengths <- pmax(0, strengths - shield) damage if ( verbose ) { cat("The expected damage is about ", mean(damage), " units.\n", sep = "") }invisible(mean(damage)) }
Let’s try it out:
photonTorpedoSim(reps = 100000, seed = 4040, upper = 15, shield = 12)
## The expected damage is about 0.2991273 units.
Here’s the function:
<- function(reps = 10000, seed = NULL) { dual if (!is.null(seed)) { set.seed(seed) }# set up logical vector to store results of each simulation: <- logical(reps) abeWins for (i in 1:reps) { <- TRUE dualing <- 1 shotNumber while (dualing) { # if i is an odd number, then Abe is shooting: <- shotNumber %% 2 == 1 abeToShoot # probability of hitting depends on who is shooting: <- ifelse(abeToShoot, 0.4, 0.6) hitProb # shooter takes a shot: <- sample(c("hit", "miss"), resultOfShot size = 1, prob = c(hitProb, 1 - hitProb)) if (resultOfShot == "hit") { <- ifelse(abeToShoot, TRUE, FALSE) abeWins[i] <- FALSE dualing }<- shotNumber + 1 shotNumber } }# count up how many times abe won, divide by number of plays, # and return: sum(abeWins)/reps }
Let’s try it out:
dual(reps = 100000, seed = 3939)
## [1] 0.52441
Looks like Abe has a slight advantage!
The addition of the invisible return may be thought of as yet another step in the encapsulation-and-generalization process of program development!↩︎
Statistical theory tells us that when we use Monte Carlo simulation to estimate a probability or an expected value, then the likely size of the error in the approximation is inversely proportional to the square root of the number of repetitions we use. From this it follows that to cut the likely size of the error in half you should increase the number of repetitions by a factor of 4. If you want the likely size to be one-third as big, increase repetitions by a factor of 9, and so on.↩︎