[Edit: Doomsday and the Dice Room Murders supersedes this post, giving a more general analysis that allows for an unbounded population.]
Scott Aaronson has a wonderful book called Quantum Computing Since Democritus, and in his chapter on the anthropic principle he writes about a thought experiment (attributed to John Leslie) that he calls the Dice Room:
Imagine that there’s a very, very large population of people in the world, and that there’s a madman. What this madman does is, he kidnaps ten people and puts them in a room. He then throws a pair of dice. If the dice land snake-eyes (two ones), then he simply murders everyone in the room. If the dice do not land snake-eyes, then he releases everyone, then kidnaps 100 people. He now does the same thing: he rolls two dice; if they land snake-eyes, then he kills everyone, and if they don’t land snake-eyes, then he releases them and kidnaps 1000 people. He keeps doing this until he gets snake-eyes, at which point he’s done.
This scenario leads to the following question: if you are kidnapped, what is your probability of dying? He gives two answers. The first:
…the dice have a 1/36 chance of landing snake-eyes, so you should only be a “little bit” worried (considering.)
And the second:
…consider, of people who enter the room, what the fraction is of people who ever get out. Let’s say that it ends at 1000. Then, 110 people get out and 1000 die. If it ends at 10,000, then 1110 people get out and 10,000 die. In either case, about 8/9 of the people who ever go into the room die.
…We can say that we’re conditioning on a specific termination point, but that no matter what that point is, we get the same answer. It could be 10 steps or 50 steps, but no matter what the termination point is, almost all of the people who go into the room are going to die, because the number of people is increasing exponentially.
(Actually, rather than , so we’ll use in the rest of this note.)
Aaronson comments that “If you’re a Bayesian, then this kind of seems like a problem,” as Bayesian reasoning seems to give two different answers to the same question.
Right off the bat, you should be smelling something rotten here: Bayesian reasoning is just unadorned probability theory, nothing more and nothing less, and we know that the rules of probability theory are logically consistent—or rather, if they’re not, then Zermelo-Fraenkel set theory is also inconsistent and mathematicians start throwing themselves off of tall buildings.
The other reason you should be skeptical of the above argument is that it’s purely verbal. It is extremely easy to make mistakes in applying probability theory if you just give intuitive verbal arguments without actually grinding through the details of the math. It’s easy to make mistakes even if you do most of the math except for some small step that seems obvious (see, for example, my analysis of the Marginalization Paradox.)
We shall see that the error in Aaronson’s (Leslie’s?) reasoning lies in dropping a conditioning proposition when assessing a probability; that is, an unconditional probability is implicitly used when a conditional probability is called for. The conditioning proposition at first glance may appear irrelevant, but it greatly alters the probability of .
Let’s get explicit
To start the analysis, let’s be crystal clear on our assumptions:
That is, is the total number of people kidnapped up to the -th step inclusive (if the process gets that far). This assumes that no person is ever kidnapped more than once, which appears to be implicit in Aaronson’s description.
- Assume that the population of the world is for some positive integer . This assumption is just a convenience so that we don’t have to deal with any “left-overs.”
- Assume that is large enough that that it is almost certain that the madman will get snake-eyes before he runs out of people to kidnap. For example, gives a probability of about that the madman eventually gets snake-eyes.
- Writing for “the madman murders at step ” and for “the madman has already murdered by step ,” the madman’s choice process is as follows, for :
where “” means “not.” That is, once the madman has murdered he does not do so again, and if he has not murdered in a prior step then there is a probability he will murder in the current step.
- The madman selects victims in a random order. Let be a random permutation of the numbers from to ; that is, we take all possible permutations of to to be equally probable values for . If we imagine that the individuals in the population are numbered from to , then means that individual is the first selected, means that individual is the 100th to be selected, and so on. We can always extend the random ordering out past the point where the slaughter occurs, so we take to be meaningful regardless of whether or not individual is ever kidnapped. Since all permutations are equally likely, we have
for all integers .
- The random ordering defines batches of individuals, where batch 1 has the first individuals, batch 2 has the next individuals, and so on. Define to be the number of the batch to which individual belongs; specifically,
- Individual dies, written , if the madman murders on the step corresponding to the batch to which individual belongs:
- Individual gets kidnapped, written , if the madman has not yet murdered on the step corresponding to the batch to which individual belongs:
- You are some individual , and the information you have available is that you have been kidnapped. Thus, the probability you are interested in is
The first step in formalizing either of Aaronson’s arguments is to decompose the problem by batch:
That is, the probability of being murdered is the sum of the probabilities of being murdered at each of the possible steps.
The next step is to apply the product rule to , defined as
The product rule states that for any three propositions , , and , we may decompose that probability that both and are true, conditional on being true, as
Since is equivalent to , this gives two different possible decompositions, and that is where the two analyses differ:
- The first analysis factors as
and implicitly claims that
regardless of . (That is, knowing that belongs to batch and is kidnapped provides no information relevant to the probability of the madman murdering at step beyond establishing that he has not yet murdered by step .) Since individual certainly belongs to one of the batches, we then get
- The second analysis factors as
and implicitly claims two things: first, that
(if is kidnapped at some point and the madman murders at step , then has a probability of about of being in batch ), and second that
(it is highly probable that the madman eventually murders, and knowing that is kidnapped does not change this). This then yields
in contradiction to the earlier answer of .
To resolve the apparent paradox, we then have to determine which of these three claims are valid and which are not:
Claim 1. .
Claim 2. .
Claim 3. .
All three of the claims involve conditional probabilities. To evaluate the claims we need to think about (conditional) independence of variables. Two propositions and are independent, conditional on , if
or, equivalently, if
Two variable and are independent, conditional on , if the propositions and are independent, conditional on , for any possible pair of values and .
The figure below shows the Bayesian network graph for our problem when . There is a node for every variable, and for every variable there are arcs to it from those variables used in its immediate definition. You can read off conditional independencies directly from this graph, using the notion of d-separation , although I won’t go into all the details here.
The important point we’ll use if that if every undirected path between two variables passes through a “collider” node that is not part of the information you are conditioning on ( in the above discussion), then those two variables are conditionally independent. A collider node on an undirected path is a variable that has both arcs pointing in to it. For example, in the undirected path
between and , the variable is a collider node.
Verifying Claim 1
We first note that, for ,
If you belong to batch , then you are kidnapped if and only if the madman has not murdered before step . So
From the Bayesian network graph we can see that is independent of conditional on : all undirected paths between and include either the collider or the collider , and we are not conditioning on either or . So we have
and Claim 1 is verified.
This tells us that the first analysis is, in fact, correct:
Verifying Claim 2
We first note that, for ,
If the madman murders at step , then you are kidnapped if and only if you are in batch or an earlier batch. So
All undirected paths between and are again blocked by one of the colliders or , so and are independent conditional on , giving
and Claim 2 is verified.
Claim 3: not verified
As you have no doubt guessed by now, Claim 3 does not check out. Although it is true that
we cannot substitute for . and are dependent, due to the undirected path between them that run through the variables (among others.) This is where the second analysis falls down.
In fact, we find that
that is, if you know that you are kidnapped, but you don’t know at which step, then it is highly likely that the madman never murders anyone at all! Furthermore, this holds no matter how large you make . The intuition behind this is that it is highly likely that you are in batch , as batch contains about 9 times as many people as all other batches combined. But if you are in batch and are kidnapped, then there is only a probability of that the madman will roll snake eyes at this final step; most likely he will not roll snake eyes and, having run out of people to kidnap, will not commit the planned murder.
Details on Claim 3
Having given the hand-wavy verbal argument, let’s grind through the math. Using the standard identity for conditional probabilities,
with the last step justified by the fact that and are independent, as discussed in verifying Claim 2. But
We can check from the graph that and are independent, so
Putting these two equalities together gives
In summary, we have found that Aaronson’s first analysis of the Dice Room scenario is correct, and the second analysis, based on considering what proportion of the people who ever go into the Dice Room die, is not. The latter analysis implicitly (and erroneously) reduces the conditional probability —the probability that the madman murders at step , given that individual is kidnapped—to the unconditional probability . The conditioning information turns out to strongly affect the probability of .
1. Geiger, Dan, Thomas Verma, and Judea Pearl (1990). “Identifying independence in Bayesian networks,” Networks 20, pp. 507–534. Online PDF.