Tuesday, May 28, 2013

The Monty Hall Problem and a Simulation in R

The Monty Hall problem has possibly the most counter-intuitive result in probability.  An in-depth description is here, but the set-up is easy.  On the game show "Let's Make a Deal", the contestant would be given the choice of three doors, behind one of which was a new car.  Monty would show the contestant what was behind one of the other doors.  Often it would be a goat or some other non-prize.  The key point was that, after showing the contestant the non-prize door, Monty would offer the contestant the chance to switch his choice of doors to the remaining unopened door.

At first glance, there doesn't seem to be any reason for the contestant to switch.  Now that there are only two doors left, he has a 50/50 shot of having the car behind his original door, right?  Nope.  It turns out that he should absolutely switch. 

Why?  The best way I've found to think about it like this...  You pick a door, then Monty gives you the chance to trade your door for both of the other doors.  In that case it's clear you should switch.  That's essentially what is happening in the game, except that Monty is showing you what's behind one of the two other doors ahead of time.  Each door has a 1/3 chance of having a car behind it, so, of course, you want two doors instead of one.

Since this is such an interesting problem with a simple set-up, I though I would write a simulation in R to verify that switching really did give the contestant a 2/3 chance of winning a car.

monty.hall.sim <- function(trials=50000){

    #Returns win percentage of those who switch

    car.location <- sample(c('A','B','C'), trials, replace = TRUE)
    player.pick <- sample(c('A','B','C'), trials, replace = TRUE)
    door.shown <- rep(0,trials)
    door.switched.to <- rep(0,trials)
   
    for(i in 1:trials){
        all.doors <- c('A', 'B', 'C')
        available.doors <- all.doors[!all.doors %in% c(car.location[i], player.pick[i])]
        door.shown[i] <- sample(available.doors,1)
        available.door.for.switch <- all.doors[!all.doors %in% c(player.pick[i], door.shown[i])]
        door.switched.to[i] <- available.door.for.switch
    }
   
    player.wins.originally <- as.numeric(car.location == player.pick)
    player.wins.after.switch <- as.numeric(car.location == door.switched.to)
   
    #For checking
    #MH.df <- data.frame(car.location, player.pick, door.shown, door.switched.to, player.wins.originally, player.wins.after.switch)
   
    nonswitching.win.pct <- sum(player.wins.originally) / trials
    switching.win.pct <- sum(player.wins.after.switch) / trials

    return(switching.win.pct)
}

I wrote this as a function so that I could time it.  There are faster ways of doing it, but this works for now.