The LessWrong community has, as a sort of deeply ingrained instinct/rule, that we should never “agree to disagree” about factual matters. The map is not the territory, and if we disagree about the territory, that means at least one of our maps is incorrect. You will also see us citing this mysterious “Aumann’s Agreement Theorem.”
I wish to explain this. Aumann’s theorem says, broadly speaking, that two rational agents that share the same priors and whose posteriors are common knowledge cannot agree to disagree on any factual matter. You’ll notice that I ominously italicised the words “common knowledge.” This has a good reason to be.
Common knowledge is a much stricter condition than it sounds. Suppose you and I are reasoning about some proposition A. My background knowledge is given by and yours is given by , and we have that and . A proposition C is called common knowledge of two agents with respect to some other proposition A if:
- C implies that you and I both know C.
- I would have assigned probability to A no matter what I saw in addition to C.
- You would have assigned probability to A no matter what you saw in addition to C.
…this doesn’t sound very useful. When it’s put that way, it’s pretty clear that the theorem is true. I mean, that’s basically saying that, for any proposition , I would have that and something similar would go for you. Since we’re both rational agents, that’d mean that .
Aumann’s theorem sounds much cooler in Aumann’s original description. He said something like “I know your posteriors, you know that I know your posteriors, I know that you know that I know your posteriors, and so on ad infinitum,” and vice-versa, and that’s the definition of “common knowledge.” What Aumann didn’t mention in that description is that he meant all your posteriors and all my posteriors. About everything. McAllister proved that both definitions of “common knowledge” are the same here, but a quick explanation of the idea is this:
Aumann imagined a space of all possible worlds (that is, all possible descriptions of every single fact about anything ever). According to my knowledge, that space is partitioned in some subspaces, one such partition being the one that contains all universes allowed by my knowledge. The actual real universe is in there somewhere. You, with your knowledge, have one such partition too. Therefore, the actual real universe is somewhere in the intersection between my set of “possible universes” and your set of “possible universes.” In Aumann’s proof, he assumed that I know about your partition and you know about mine, which means we both know what the intersection is, so we know everything the other knows about everything ever. This is equivalent to McAllister condition if you take his magical proposition C to be exactly that intersection. So this “common knowledge” condition is tantamount to mind melding, and yeah, if two rational agents mind meld, they’ll indeed agree about everything!
And as Wei Dai commented at the end of his post, that’s not very useful. We cannot, in fact, mind meld with other rationalists and end up with the same beliefs. Even if I know your posteriors for A, and you know mine, and they’re common knowledge, that is still not a strong enough condition, which sort of requires that all of your posteriors determining A be common knowledge. We don’t live in an ideal world where we can email each other the entirety of our beliefs.
Still, there ought to be some way to use another agent’s beliefs as evidence. The modesty argument is the position that if you and your peer find yourself disagreeing, you should update towards them, and they towards you, and you keep doing that until you agree. The linked post mentions a(n arguably) clear counterpoint to that: if you’re talking to a creationist, you probably shouldn’t average your beliefs and believe creationism with 50% probability. You really are in fact right, and the creationist really is in fact quite wrong.
But sometimes, not only is the best combination not the average, it’s more extreme than either original belief.
Let’s say Jane and James are trying to determine whether a particular coin is fair. They both think there’s an 80% chance the coin is fair. They also know that if the coin is unfair, it is the sort that comes up heads 75% of the time.
Jane flips the coin five times, performs a perfect Bayesian update, and concludes there’s a 65% chance the coin is unfair. James flips the coin five times, performs a perfect Bayesian update, and concludes there’s a 39% chance the coin is unfair. The averaging heuristic would suggest that the correct answer is between 65% and 39%. But a perfect Bayesian, hearing both Jane’s and James’s estimates – knowing their priors, and deducing what evidence they must have seen – would infer that the coin was 83% likely to be unfair.
The suggestive title of the article is the point I’m going to make. In a previous post I defined a mathematical object called the likelihood ratio. Given data D and a hypothesis H, the likelihood ratio for D is:
That is, it’s the ratio between how likely the data is to observed when the hypothesis is true as opposed to when it’s false. In the coin example, if James told Jane that his observed evidence had a likelihood ratio of against fairness, and Jane told James that her observed evidence had a likelihood ratio of against fairness, then they’d conclude that their combined evidence has ratio and so their posterior odds are which gives us probability that the coin is unfair.
This is much more useful and much simpler than sharing posterior beliefs and averaging them, or mailing your friend a summary of everything you believe that could be remotely relevant to the matter at hand. And of course, there’s a trade-off there in the form of having to keep prior beliefs separated from likelihood ratios: instead of keeping the condensed form of a posterior distribution for every belief, you need to keep track of twice as much information. Salamon and Rayhawk use this as an example:
Person A: So, what do you think of Jack?
Person B: My guess is that he’s moderately (smart / trustworthy / whatever), but not extremely so.
Person A: Is the “not extremely so” because you observed evidence Jack isn’t, or because most people aren’t and you don’t have much data that Jack is? Where’s the peak of your likelihood function?
I had a similar conversation with raginrayguns recently where I said I thought most philosophy was silly, and he said that he thought most philosophy he read was quite cool. I asked him what his likelihood ratio for its coolness was. And he said that he found cool stuff almost every time he’d tried to read philosophy, and pointed out that if his filters were good enough then he’d see lots of cool philosophy even if most of it was silly. After some discussion, the conclusion was that, taking into account how good he expected his filters to be and how much silliness he’d found nonetheless, his evidence gave him a ratio that was slightly in favour of overall coolness of philosophy – plus we reached the mildly surprising conclusion that if his filters were good enough then his data had a likelihood ratio that was less than unity, because if even with very strong filters he’d still found silly philosophy, that was strong evidence of a predominance of silliness.
Now, we don’t share the same priors. My priors were that most philosophy was silly. My posteriors still are, but now I’m slightly less convinced of that. But the point is that, by sharing the likelihood ratio, you’re basically sharing the evidence you collected. It’s the next best thing to just showing each other the actual data.
This type of dialog is useful. Let’s say that A’s initial impression is that Jack is amazing, and B’s impression is that Jack is somewhat less amazing. If B knows Jack well, A should lower her estimate of Jack. But if B’s impression come from a tiny amount of amazing-looking data from Jack — just not enough to pull Jack all the way from “probably normal” to “probably amazing” — A should raise her estimate. B’s posterior expectations about Jack’s amazingness are identical in the two cases, even though B’s observations in the two cases have opposite implications for A. Trading likelihoods notices the difference, but trading average posterior impressions doesn’t.
So the likelihood ratio contains the relevant information for inference, and if there’s something that should become a standard tool in the rationalist social toolkit it’s this sharing of ratios.
(And of course, if you still disagree after sharing likelihood ratios, then either you did not have the same priors (as happened between me and raginrayguns) or at least one of you is doing something very wrong.)