Winning While Losing: New Strategy Solves 'Two-Envelope' Paradox
August 18, 2009 By Lisa Zyga
Opening one envelope tells you more than you might think. Random switching based on the value of the envelope can help a player win more money in the long run. Image credit: Wikimedia Commons.
(PhysOrg.com) -- Researchers from Australia have taken a step toward resolving a seemingly simple yet unsolved paradox known as the "two-envelope" problem. They’ve worked out a new strategy that can enable a player to beat the game in terms of increasing their payoff. The strategy could have applications in optimizing gains in investments and other areas.
Mark McDonnell of the University of South Australia and Derek Abbott of the University of Adelaide have published their results in a recent issue of Proceedings of the Royal Society A.
The Paradox
In the two-envelope paradox, a player must choose between two envelopes, one of which contains twice as much money as the other. The player can open the envelope they choose, and then they have the option of switching envelopes. The other envelope, of course, has either twice the money or half the money as the first envelope, but the player does not know which.
It may seem that, since a player has a 50-50 chance of choosing either envelope, they have an equal chance of gaining or losing money whether they decide to switch or keep the original envelope. However, probability theory seems to confusingly show that it’s always better to switch.
For example, say the first envelope you pick has $10, so that the other envelope has either $20 or $5. Then you can calculate the expected value (i.e. the probability-weighted sum of the possible values) of the second envelope, assuming that each possibility has a 50% chance: (0.5 x $5) + (0.5 x $20) = $12.50. Since $12.50 is more than $10, it makes sense to switch. No matter which numbers you use, you always get an expected value for envelope two that is 5/4 higher than the value for the original envelope: if c is the value of the original envelope, the expected value of the second envelope is (0.5 x [0.5c]) + (0.5 x [2c]) = 5/4c. The mathematical difference is determined by the relations between the envelopes’ values, but it still doesn’t make sense to switch every time, since it could be argued that a player could have started out with the second envelope in the first place - yet still be advised to switch.
Mathematicians have been trying to figure out the problem (or some variation of it) since 1930, though it was not expressed in the two-envelope format until 1988 by Harvard mathematics professor Sandy Zabell. Though several researchers have claimed to have found solutions to the paradox, no consensus has been reached and so the problem is still considered unsolved.
Randomized Switching
Perhaps, as McDonnell and Abbott suggest, the key to the paradox may occur when the player looks inside the first envelope; knowing this information breaks the symmetry, since the envelopes are not identical anymore. To demonstrate this idea, the researchers have worked out a formula that can increase a player’s chance of picking the envelope with the greater amount of money, if played repeatedly.
The researchers named the new method Cover’s strategy, since it originated with a suggestion by Stanford engineering professor Tom Cover during lunch. In the strategy, a player randomly switches envelopes with a probability that depends on the amount of money in the first envelope. The larger the amount, the less likely it is that a player should switch, even without knowing how high or low the values might be (the distribution). Over 20,000 simulations, this strategy increased a player’s payoff compared with simple switching. The researchers also found that a deterministic switching strategy - where a player switches whenever the value of the first envelope is smaller than some predetermined threshold - also leads to a gain compared with never switching.
“The apparent paradox arose before because it didn't seem to make sense that opening an envelope and seeing $10 actually tells you anything, and therefore it seemed strange that your expected value of winning is $12.50 by switching,” Abbott told PhysOrg.com. “But we resolve this by explaining it in terms of symmetry breaking. Before the envelopes are opened, the situation is symmetrical, so it doesn't matter if you switch envelopes or not. However, once you open an envelope and use Cover's strategy, you break that symmetry, and then switching envelopes helps you in the long run (with multiple plays of the game).”
The researchers explained that the strategy emerges from recent advances in two-state switching phenomena that are emerging in the fields of physics, engineering, and economics. For example, in stochastic control theory, random switching between two unstable states can result in a stable condition.
“When I had lunch with Tom Cover in 2003 and he suggested that his strategy ought to work, I thought he was nuts and refused to believe it,” Abbott said. “It was that counterintuitive that I thought it was crazy. But I went back to Australia and slowly came around to Cover's viewpoint after careful thought over the years. My expertise in Brownian ratchets was the key to me understanding the physical picture behind it.”
As Abbott explained, a Brownian ratchet is a physical device that can organize random particles to flow in a particular direction. “The trick with a Brownian ratchet is that again it uses the idea of breaking symmetry,” he said. “It is this idea that is behind the principle of the well-known ‘Parrondo's paradox,’ which shows that you can mix two losing games and yet win. This solution to the two-envelope problem is a breakthrough in the field of Parrondo's paradox.”
Winning While Losing
Although a player can use the random switching strategy to win money when having prior knowledge of the statistical distribution of the envelopes’ values, the significant point is that this knowledge isn’t necessary. “What is surprising is that our analysis shows that you can always improve your gain using Cover's method with ignorance of the ‘house limit’ (the highest value of money allowed) and of the statistical distribution the numbers obey,” Abbott said. “That is rather amazing. And the reason it is of importance is that engineers often have to consider what are called ‘blind optimization’ problems. And so our solution may stimulate new work in this area.”
Another type of optimization method that shares similarities with the two-envelope problem is financial investing in the stock market. For instance, in "volatility pumping," switching between poor investments can result in winning an exponentially increasing amount of money.
“Volatility pumping is a ‘toy model’ that you can't use exactly in its present form on the stock market,” Abbott explained. “However, it is a toy model that illustrates underlying mechanisms that are useful. It suggests the power of changing your portfolio of stocks periodically, buying low and selling high. Both the two-envelope process plus volatility pumping appear closely related to Brownian ratchet phenomena. They both exploit the interaction of asymmetry with randomness.”
This insight also brings with it a number of open questions. For example, when playing a sequence of games, a player could modify the details of the strategy by continually updating the estimated distribution from which the envelopes’ values are chosen. Also, since the strategy relates to two-state switching in other fields, perhaps it may be possible to explain all these phenomena with a common mathematical framework.
More information: Mark D. McDonnell and Derek Abbott. “Randomized switching in the two-envelope problem.” Proceedings of the Royal Society A. doi:10.1098/rspa.2009.0312
Copyright 2009 PhysOrg.com.
All rights reserved. This material may not be published, broadcast, rewritten or redistributed in whole or part without the express written permission of PhysOrg.com.



?!?!?!?!?!?!
0.5 x $10 = $5
0.5 x $20 = $10
$10 + $5 = $15 ***** wtf ?
The math expressed on this wiki is correct, and produces the $12.50 you were looking for.
0.5*2*A + 0.5*0.5*A
A = 10, solution = $12.50
Sorry for all of my comments, I was having a hard time accepting Physorg made this error, jeez. /embarrassed that this is a science website.
(The wiki exposes other flaws in the original reasoning at that.)
After picking the first envelope, you are guaranteed at least half of the first envelope (not zero), with the prospect of getting twice that. In order for there to be no expected gain from switching, the second envelope would need to have a 50-50 probability of having either zero dollars or double the first envelope.
What is most notable is that the expect value of the entire game does not change after the first and second choices. Before the first choice, there is a 50% chance of getting $c and a 50% chance of getting $2c. Thus, the expected value of the game -- before any selection is made -- is $1.25c. This doesn't change after the initial choice has been made, as noted in the article.
What would be more paradoxical is if the expected value of the game did change after the selection of the first envelope. Since picking an envelope conveys no information about its value relative to the other envelope, there shouldn't be any change in the expected value of the game.
A variation on this game that destroys the so-called paradox would be to have some one pick one of two colors of chips (e.g., red and green). One of the chips is worth $c and the other has a 50-50 probability of being worth either $0 or $2c. In this case, the expected value of the game before the first selection is $c [or .5($c) .5((.5)$0 (.5)(2c)]; and it remains $c regardless of whether you switch chips or not, since each chip is individually has an expected value of $c.
Now, if the "money" in each envelope was truly randomized (a number between 0 and infinity) then I think Cover's strategy would break down. It all depends on whether there is a social bias in play.
"Seeming" (a result of unconscious conclusions reached after several plays) can become a valid factor in deciding which envelope probably as more money than the other. Did they filter for that factor? I don't see how.
I believe it does make the assumption that there is some (although it does not have to be strictly so) uniformity to the probability driving the amounts of money put in the envelopes. Thankfully (for this strategy), the strategy will tend to work if such a uniformity exists even if you in no way understand it.
This paradox is easily resolved if one understands that if there is a conflict between predictions the prediction which employs full knowledge of the system takes precedence over one which employs only partial knowledge.
In this case the person playing the game has partial knowledge and based on this partial knowledge it does indeed seem to him that there is a gain in switching but this gain is illusory as is clearly seen from the perspective of the person who put the money in the envelopes, he knows the amounts and that no matter which one the player picks and how many times he switches he has equal chance to pick the higher and the lower amount and will as a result win half of the sum on average.
Now the above is only true if the possible amount of money is truly unbound as it is in the original paradox, if there is any limit explicit or implicit (and there always is when humans play such games) switching does make sense in certain circumstances and the strategy described in the article might indeed work.
Half the time you'll get X dollars; a quarter of the time you'll get 2*X; a quarter of the time you'll get X/2.
So [(1/2)(x)] [(1/4)(2x)] [(1/4)(x/2)] = EV = 9/8, which is greater than one, which is why this is not a paradox at all because you are not starting with EV=1 to begin with.
...
Expected Value =
[(1/2)(x)] plus [(1/4)(2x)] plus [(1/4)(x/2)] = (9/8)x.
So given the parameters the expected value is greater than x dollars before any envelope is opened in the first place.
+ is entered with '& #43;' (remove the space.)
More:
http://www.integr...ters.htm
1. You get more.
2. You get less.
So it is a 50/50 chance.
Applying those probability calculations doesn't make sense because of this - in the same way that applying calculations designed for continuous statistical data don't work for discrete statistical data.
This reminds me of the famous Monty hall problem.
This is what the expected value is telling you.
20 5 / 2 = 12.5
There are two distinct concepts at play - your money and your chances of winning.
and not 10 and then either 15 or 5 ie Gain 5 or lose 5
In this case there are 3 (NOT 2) possibilities. One envelope for each of the denominated amounts 5, 10 and 20.
The way they have set this up is not the way people think it's set up - most people believe that there are only two kinds of envelope. They've been led to believe this by the conjuror's trick of distraction - the photograph, the frequent talk about 2 possibilities etc.
In other words it's a con.
Fix the envelopes BEFOREHAND at 3 denominations and calculate the odds - no difference when switching.
Likewise, fix the envelopes beforehand at 2 denominations and calculate the odds - no difference when switching.
In other words, in a range of numbers 0...infinity, the odds are 0 that you will pick a non-infinite number, just like the odds are 0 that if you *randomly* pick a real number between 0 and 1, the odds are 0 that it will be rational, because there are an infinite number of real numbers between each rational number.
Risk Reward says you switch every time.
Probability says you never switch as you already hold the average of the envelopes and have achieved the best possible outcome over an infinite decision result set.
The interesting piece here is that this thought process should be experimentally testable in a real world scenario utilizing network theory.
That would only be valid to the postulate if there was potential to receive a bill for $99,999,990 in the other envelope.
http://www.faqs.o...ecision/]http://www.faqs.o...ecision/[/url]
Steady state
Double
Half
symmetry-breaking argument properly, but it seems to me to be one of those
"paradoxes" which involve 0 or infinity in a disguised form. If, say, one of
the numbers involved is selected at random from ALL numbers, then there's
only an infinitesimal chance that it will be less than the number of
particles in the universe. I vaguely remember Q Comp stuff saying that it
would then be theoretically non-computable(?). ie not simply impractical.
So the alternative would be to bias the selection of the number towards
smaller numbers. Then if the number in the first envelope was large, by some
criterion, then there would be
So the alternative would be to bias the selection of the number towards
smaller numbers. Then if the number in the first envelope was large, by some
criterion, then there would be
Then if the number in the first envelope was large, by some
criterion, then there would be
But maybe mathematicians have ignored that, because the criterion seems to
me to have to be a social one. (How you do the experiment IN PRACTICE - how
big the numbers really would be. How could the bias not be social rather
than mathematical?
The number you get in first envelope is not a true random. There is 75% probability that the number is smaller than half of maximum amount and 25% probability that it is bigger than half of maximum amount.
with true random you can even play double or nothing and win just about 0. In this case there is double or half and you can win about 0 on average.
3 possible scenarios played out.
1 keep the tenner!!!
2 swap and get a 5
3 swap and get a 20
The probability of 2 and 3 are equal ...
However the average return is not 12.5.. it would be approximately 11.25.
My reasoning ...
A series of plays of the game where the envelopes contain 10 and 5 , assuming you always open the 10 and swap half the time results in an average return of (10 5 ) /2 = 7.50.
A series of plays of the game where the envelopes contain 10 and 20 , assuming you always open the 10 and swap half the time results in an average return of (10 20 ) /2 = 15.
And a series where of the game where 10 is always opened and swapped all of the time and assuming a 5 or 20 half the time the average return would be (7.5 15)/2 = 11.25.
I think the game is supposed to be analogous to some class of physical problem. It seems likely that the game rules, as stated, don't model a real-world problem sufficiently to be very enlightening. It is under-constrained.
With additional constraints one could make better decisions. For example: envelopes may only contain a quantity of dollar bills. Envelopes cannot contain more than N bills. Envelopes look the same regardless of the quantity of bills contained. The object of the game is to maximize the quantity of bills collected for a set number of trials. Etc.
Without the additional rules the game model just isn't descriptive enough to match a real problem.
It sounds as if the Cover strategy is adaptive, discovering and remembering the 'house limit' as it goes along, effectively added a constraint to the model (knowledge of the 'house limit').
They don't describe how it goes about this, but for the sake of discussion, I'll assume a simple model of remembering the greatest quantity of money found in an envelope and mapping the probability of switching to the range of zero to the remembered value.
Such a strategy could be improved upon by expanding it further.
For example, if such a strategy were used against another player, that player might initially put very large quantities in the envelope in order to train the algorithm to expect a high house limit. The player would then proceed by putting only low values in the envelope, thereby causing the Cover strategy to very frequently choose to switch, greatly reducing it's effectiveness.
The strategy can combat this by maintaining a memory of all the values it has seen, recognizing distribution patterns and choosing a switch probability based on the distribution discovered and the current value.
Again, this adds more constraints to the model (knowledge of the distribution of values).
Thus, no paradox. Purely intuitive.
First half of @paulthebassguy Aug.18,2009 sounds like this, too. Right. But, that is not about discrete vs. continuous, but value-judgment vs. absolute-value, I think.
If you would think as a long-term (only-probability, not value-weighing) view, then you have to reflect from the start.
1) drawDoubleFirst vs. drawSingleFirst
2) switch vs. not-switch
The "paradox" ignores the first step. The question of switching's value must sum both.
X1 = firstWasDouble * switch + firstWasSingle * switch
X2 = firstWasDouble * notswitch + firstWasSingle * notswitch
X1 = X2
Thus, there is no paradox, again. No gain through a strategy of long-term switchings like that.
@Prsn Aug.19,2009 sounds like telling this, too.
Thus, there is no paradox. Do we three suffice?
The value (the ratio) of the switch vs. not-switch is 1. That is, the probability of "$12.5" is not telling what money will come, but what the value is. That value is the same, if you would reflect "what is the value of not-switching?" That is $12.5, too. That is, holding $10 has a value of $12.5 of notswitching, too.Thus, the question is not absolute-value (in dollar terms), but for weighing alternative options, by their relative-values.
Good for reflecting among a few options. Not necessarily only two.