Monte Carlo Example: Pólya’s Lucky Dip

Edit (2013/10/04): Under the first picture below, I mention a line which should be on all the pictures, but isn’t. This line should be at around 25 for all of them.

Probability has a few standard analogies. Let’s get to grips with one of them.

Let’s say we sit Ronald down in front of a bucket of red and blue magnetic toy fish, and the red fish have a prize written on them. He then catches one of the fish with a magnetic rod. It turns out to be blue, so he stores the fish by clipping it to his beard, and tries again. Assume the rather unlikely case where he knows that there are ten red fish and a hundred blue ones. What’s the chance of winning if he can catch up to three fish?

The standard mathematical way to solve this is to compare the picking-out of fish to one of the classic examples of picking balls out of an urn. But, frankly, it’s early in the morning, and I don’t want to deal with binomial coefficients before I’ve had a few drinks. What I could do with is convincing Ronald to sit around playing lucky dips all day, and see how often he wins. Since it’s a bit late to ask Ronald to fish, I’ll use a computer instead.

If we’d had Ronald to do this, I’d have to choose how many times he got to have a go. Since we’re using a computer, I’m just going to pick a few different numbers, and see what difference it makes. After a break for a drink, this is what I came back to.

Fig1

The dots are the guesses, the line is what I know the true probability to be. However, if I run this again, the result can be very different.

Fig2

Our guesses have a random element to them, so that shouldn’t be surprising. If I let Ronald play ten times, and then ten times again, the two sets of results needn’t have the same number of winners. What this means is that, if I make several guesses with the same number of plays, they’re going to be spread out. Since, in practice, we’re only going to make one guess, we’d like the spread to be pretty small. Hopefully, we can achieve this by increasing the number of plays.

I reworked the program to take a hundred guesses for each number of plays, and then use them to draw boxplots. After a bigger drink, I came back to this.

Fig3

If you’re not used to boxplots, half of the estimates are inside the box, and the other half are inside the dotted lines. Dots are outliers that I’m just going to ignore here. The boxes get smaller as the number of entrants increases, and the box for 10000 plays is tiny. In other words, increasing the number of plays decreases how spread out the estimates will be.

What we can say is that, if you wanted your guess to be precise to within a percentage point or two, you’d need to simulate about ten thousand goes. Maybe it’s just as well we didn’t ask Ronald, I’m out of drinks to bribe him with for that long.

Does the box shrink towards the correct value? We don’t know, unless we work it out. In this case, my throat is now wet enough that I feel up to working it out on paper. In this case the true probability of winning is \frac{82}{327}, or about 25%, so it looks like the guesses tend towards the correct value as we increase the number of plays. It also looks like this would be a terrible lucky dip from the point of view of the person paying for the prizes, but never mind.

This is mainly because we know how to make direct guesses. By that, I mean the new data we’re generating is a direct statement about what we think the probability is. What we didn’t have to do, for example, was to be given two sets of Ronald’s win frequencies, and have to guess at whether he was fishing from the same bucket each time. We could generate more win frequencies, but those aren’t something we can directly use as a statement of whether or not the bucket is the same.

This requires more clever methods, and these more clever methods don’t necessarily tend to the correct answer if we run them for longer. That might be because the method we decide to use is very good. It might be because the data we can generate is so uninformative about the answer, that deriving one is going to introduce errors, regardless of what we do. But that deserves a separate post.

Why didn’t I just have a few drinks first, and go straight to getting the right answer? Well, I could have done, but sometimes you don’t have that option. More on that another time.

What are Monte Carlo Methods?

“Monte Carlo methods” is a technical term for a technique you’ve used since you were a small child.

Say you flip a coin. How often do you expect it to come up heads? Well, if we assume the coin won’t land on its edge, each face has the same chance of coming up, so half the time.

Say you roll a six-sided die. How often do you expect it to land on six? Same sort of argument. One in six.

Now imagine you’re a kid, and I again ask you how often you expect to roll a six. By kid, I mean someone who won’t be able to work out the exact answer given above.

How would you try to answer this? I suspect you’d roll the die a couple of times, see how large a proportion of the rolls come up sixes, and use that as a guess.

If so, congratulations! You discovered Monte Carlo methods even before your enthusiasm for maths was killed by algebra.

That’s really all Monte Carlo methods are: instead of working out the exact answer, you take a few random samples and use them to make a guess.

If it’s too much of an imaginative stretch to need to do this for such a simple problem, say I put a board game in front of you where you move your piece according to a roll of several dice. I then ask you how often you’ll land on a particular square by the end of the game.

To guess at an answer, you can then play the game a couple of times, and keep track of how many times you land on the square in each game.

Past a certain point in schooling, it’s easy to forget about making estimates like this. The questions usually posed to you, when being taught algebra and probability, are set up so that you can obtain the exact answer by hand. You can’t make a Monte Carlo estimate in an exam, using just pencil and paper. It’s not technically illegal to take dice into an exam, but rolling them will drive everyone else in the room crazy.

Once you look at real-life applications, however, you often find problems where you just can’t find the exact answer. The maths involved might be horrendous, or impossible. You simply mightn’t know everything you’d need to know to calculate the answer, because the needed information is impossible to observe.

So, what do you do? Well, you go back to rolling dice. At least, you get a computer to roll dice for you.

Usually Monte Carlo methods are a bit more advanced – in particular, we can often get some idea of how accurate we expect the estimate to be – but this constitutes the basics.

Hang on, though. We need to generate a lot of random samples for this to work, right? Won’t that take a while?

Yes, it will. Monte Carlo methods can take a long time to give an accurate. If your problem involves, say, predicting the weather, then the equivalent of rolling a die is now running a complete weather simulation. Depending on how sophisticated the model is, and how far ahead you’re forecasting, that can take hours, or days.

However, this is much less time than we’d expect to take finding the exact solution. The kid takes a while to roll his dice, and he would have spent less time finding the exact answer if he knew how to calculate it. Since he doesn’t, however, what we really need to compare against is how long he’d wait before he could work out how to do so. That’s probably years away. What if you need an answer now? Get rolling.

There are ways we can reduce the time these estimates take. Different methods make better use of the samples, so we can get away with using less samples, and still estimate with the same accuracy. Some are easier to implement. Some are specially designed to work very well for whatever particular problem the designer is working on. Some make better use of the way that modern computers are built – I’ll talk about this in another post. It’s a very general framework, so there’s a lot of flexibility.

Since there is a lot of flexibility, what you use depends on the problem. So, we’d better look at one. Next time, I’ll give an example of using these methods on a slightly more complex set of problems, commonly known as Pólya’s Urn. That’s a technical term mathematicians invented to justify playing at lucky dip boxes all day, on which we’ll be using a technique you’ve used since you were a child. Statistics is a very serious subject.