# Using Code to Intuit the Monty Hall Problem

June 23, 2020

The Monty Hall problem is a famous puzzle that shows how statistical facts can be
unintuitive. It’s based on a game show in which the contestant selects one of three doors,
receiving whatever prize is behind it. One door has a desirable prize behind it, which is
usually a car, and the others have a gag prize, usually a goat. My father told me about this
problem when I was probably under 10, and at the time I neither understood the math, nor why
anyone would want a car when they could have *a
goat*. So for the rest of this article, assume that the preferred prize is a goat,
and the others are, say, bags of sand.

Anyway, after the contestant selects a door, the show host (originally named Monty Hall) opens another door, revealing a bag of sand. The contestant then has the option to switch to the remaining door. The question is: Is it better to switch or does it make no difference?

The answer, as you may have heard before, is that it’s better to switch, because the
probability of getting the goat when you initially pick a door is ⅓, which doesn’t change
until you switch. This is confusing, however, because it seems like after the host shows you
the bag of sand, there’s one door with a goat behind it, and one door with sand behind, so
the probability of winning should be ½, regardless of which door you pick. There are a
zillion articles explaining why this isn’t the case, which comes down to the fact that this
is a *conditional* probability, not just a choice between two random options. However,
depending on your background, you may have a better intuition for code than for math, in
which case the easiest way to understand the problem is by seeing it as a program.

Take the following Python script. If you’re new to Python, don’t be intimidated by the
list
comprehensions, which is where you have `for`

s and `if`

s inside
brackets: they are not hard to understand and they will make your life much easier if you
learn to use them.

```
import random, statistics
def simulate(switch):
"""Return True if the contestant wins the goat, False otherwise."""
# Shuffle the doors and pick one at random
= ['goat', 'sand', 'sand']
doors
random.shuffle(doors)= random.randint(0, 2)
choice
# Pick one of the doors with sand behind it to show
= [i for i in range(3) if doors[i] == 'sand' and i != choice]
remaining_sand = random.choice(remaining_sand)
shown
# If switch is selected, switch choice to the remaining door
if switch: choice = [i for i in range(3) if not i in (choice, shown)][0]
return doors[choice] == 'goat'
= 100000
trials print("Switch: ", statistics.mean([simulate(True) for i in range(trials)]))
print("Stay: ", statistics.mean([simulate(False) for i in range(trials)]))
```

Run the program, and we see that indeed, you win twice as often if you switch than if you stay:

```
Switch: 0.66462
Stay: 0.33362
```

But we already knew that. However, the key insight is on line 16:

`if switch: choice = [i for i in range(3) if not i in (choice, shown)][0]`

This is the line that changes the choice to the remaining door. If `switch`

is
`False`

, it won’t run. Comment out line 16, and your IDE or your linter will tell
you something like this:

`monty-hall.py:13:4: W0612: Unused variable 'shown' (unused-variable)`

When the contestant doesn’t switch, the information provided by opening the door isn’t
used at all. In fact, the Python VM will probably skip over the lines computing
`shown`

because it’s not used. In other words, choosing not to switch is
*exactly the same* as picking randomly from three options.

I decided to write this article because after writing the problem in code, the answer
seemed *completely obvious*. Before, I was sort of able to understand the standard
stats textbook explanation, but the answer still seemed sort of opaque. Hopefully, there will
be a few other people out there to whom code is a more native language than math who will
find this as clarifying as I did.