Bayesian falsification and the strength of a hypothesis

At the end of my post about other ways of looking at probability, I showed you a graph of evidence against probability. This is the relevant graph:

Looking at this graph was one of the most useful things I’ve ever done as a Bayesian. It shows, as I explained, exactly where most of the difficulty is in proving things, in coming up with hypotheses, etc. Another interesting aspect is the symmetry, showing that, to a Bayesian, confirmation and disconfirmation are pretty much the same thing. Probability doesn’t have a prejudice for or against any hypothesis, you just gather evidence and slide along the graph. Naïvely, the concept of Popper falsification doesn’t look terribly useful or relevant or particularly true.

So whence comes the success of the idea?

We can never confirm a hypothesis, only falsify it. Thus goes the saying. And even the staunchest Bayesian may look at the symmetry of Bayes, at the success of Popper, and awkwardly mumble some excuse. Or maybe I’m being unkind to staunch Bayesians and they’ve already thought about this. Whichever the case, if we have already rescued a hypothesis from the dark pits of hugely negative evidence once and almost confirmed it, it doesn’t immediately seem that far-fetched that we may once again bring it back from the dead, after it’s been falsified.

Whence comes the asymmetry? I phrased it all in a fairly suggestive way. Let’s think about a real-life example of a falsified theory that we all can agree will never be dug back up from its grave: Newtonian mechanics. What prevents Newtonian mechanics from coming back?

And then the answer becomes obvious: quantum mechanics does. Special relativity and general relativity and quantum mechanics are dancing on top of Newtonian mechanics’ grave and will stay there until shot dead. And that’s the progress of science, in a nutshell. That’s the meaning of falsification, to a Bayesian. That’s why we can only falsify hypotheses, but never confirm them.

The only way to kill a hypothesis is by bringing a new one to life.

Let’s suppose we’re at a point where Newtonian mechanics seems to be pretty much confirmed, P(N|X) is quite high. Then we observe the “anomalous” precession of the perihelion of Mercury. That’s strange. Einstein looks at the character of physical law, he looks at the Maxwell equations. Newtonian mechanics has action-at-a-distance, doesn’t it? P(precession|NX) is fairly low, so observing that anomalous precession is quite vexing to the theory.

Then Einstein comes up with his general theory of relativity, one that seems to take on the Machian spirit of previous laws. An object travelling at constant velocity should not be able to distinguish the world from one where it’s stationary and the universe is travelling at constant and opposite velocity: that’s Newton. An object under constant acceleration should not be able to distinguish the world from one where it’s stationary and the universe is under constant and opposite acceleration, which means that accelerated matter ought to generate gravitational waves: that’s Einstein.

Then we started collecting evidence that was increasingly vexing to Newtonian mechanics: the aforementioned precession of the perihelion, the deflection of light by the Sun, the gravitational redshift of light. That’s evidence that’s not covered at all by Newton, while it’s in fact predicted and entailed by General Relativity: P(E|NX) is close to zero, P(E|G_RX) is close to unity.  The likelihood ratio there is pretty lopsided: GR captures all of Newton’s observed predictions, and makes new predictions that are not made by Newton but are observed nonetheless.

How, dear reader, how could Newton ever recover from that blow? Yes, at one point Newtonian mechanics hadn’t even been imagined, its prior probability ridiculously tiny. The same is true of General Relativity. Once GR came, Newtonian mechanics was once again driven down to the gutter. Is there any evidence that could bring it back up?

Of course not. Popper falsification comes, not from any asymmetry in Bayes’ Theorem itself, but from the fact that the negation of any given specific hypothesis consists of the disjunction of a huge number of alternatives. If you have observed evidence that’s not covered by your preferred hypothesis, you can be damn sure there’s an alternative that captures all the predictions of the old one, and also makes other predictions that are forbidden by it, in one sweeping blow.

The strength of a hypothesis is in what it prohibits, not in what it allows. Why is that true?

A given hypothesis only has so much probability it can allocate to different observations. A hypothesis that prohibits a lot of possible outcomes concentrates its probability mass more sharply on the outcomes it does allow.

But hypotheses don’t exist in a vacuum, so to speak. There’s always other hypotheses. Suppose a given hypothesis H_0 predicts a given datum with probability P(D|H_0X)=0.6. That’s quite a bit, isn’t it? But Bayes’ Theorem goes:

P(H_0|DX) = \frac{P(D|H_0X)P(H_0|X)}{\sum_iP(D|H_iX)P(H_i|X)}

I don’t know what the posterior probability for H_0 is if I don’t know how well other hypotheses predict the data. If it turns out that P(D|H_1X)=0.7 then the likelihood ratio between those hypotheses is \frac{P(D|H_1X)}{P(D|H_0X)}=\frac 7 6 and H_1 is favoured by the data as opposed to H_0. And if there’s some hypothesis H_2 such that P(D|H_2)=0.95 then my likelihood ratio for that data will be 0.95:0.7:0.6 = 19:14:12 and it will no longer back H_1 up either.

Hypotheses are in eternal competition amongst themselves. The hypothesis that concentrates the most probability mass in the actual observed outcomes will invariably be the winner, and that’s why the real strength of a hypothesis is in what it prohibits, not in what it allows. The bland, allows-everything hypothesis will always lose bits against the sure-thing hypothesis (the hypothesis that gives probability 1 to the observed outcomes and 0 to all others).

But if that’s the case, why isn’t the sure-thing hypothesis always the winner? Well, because of priors, of course. For every possible imaginable mutually exclusive outcome, there is a sure-thing hypothesis that predicts it with probability 1, which means that before they’re observed, all those hypotheses are on equal footing, and they all have very tiny priors. A perfectly rational agent will have an Occamian prior over these hypotheses, so that simpler hypotheses that predict the data with fairly high probability, even if not probability one, can be favoured. Especially when new observations come that cut down on the sure-thing hypotheses even further.

The strength of a hypothesis is in what it prohibits, not in what it allows, because if it doesn’t concentrate enough probability mass on the actual observed outcomes (and therefore takes a lot of probability mass from the forbidden ones), the hypotheses that do will beat it to death.

Who are they competing with, though? That’s a relatively silly question, one might feel. The hypotheses are always competing amongst themselves, that’s what I just said, isn’t it? And Bayes’ Theorem is symmetrical there. Let’s take a look at the three-hypothesis case:

P(H_0|DX) = \frac{P(D|H_0X)P(H_0|X)}{P(D|H_0X)P(H_0|X)+P(D|H_1X)P(H_1|X)+P(D|H_2X)P(H_2|X)}

Or, equally:


The probability of the conjunction of the hypothesis and the data is what’s actually competing, there. Now, suppose in that case, H_1 is winning, and its probability is much larger than the other two. Then, P(DH_1|X) dominates P(DH_2|X) and we can say that


That is, asymptotically, each hypothesis competes only with the one that has the highest posterior. But even this might not be enough to visualise it. Let’s use a toy example.

Suppose there is some experiment that has two possible outcomes: Success and Failure. Furthermore, suppose we have some hypotheses H_i such that P(S|H_iX) = p_i and P(F|H_iX)=1-p_i. Then, if I run that experiment N times, n of which came up Success:

P(N_n|H_iX)={N\choose n}p_i^n(1-p_i)^{N-n}

Suppose, then, that I have three hypotheses such that p_0 = 0.1, p_1=0.2, and p_2=0.99. Further, suppose the priors for those three hypotheses are P(H_0|X)=\frac {10}{11}(1-10^{-6}),P(H_1|X)=\frac {1}{11}(1-10^{-6}), and P(H_2|X)=10^{-6}. Finally, suppose N = 50. What happens if I graph the evidence for each of these hypotheses as a function of n?


Remember, evidence is given by the formula e(H|X) = 10\log_{10}O(H|X) where O(H|X) are the odds for H. So:

e(H|DX) = e(H|X)+10\log_{10}\left[\frac{P(D|HX)}{P(D|\bar HX)}\right]

Since the probabilities P(N_n|H_iX) are exponential in n, the graph of the evidence is best approximated by a line. But the interesting thing there is the behaviour of those lines.

The green line is H_0, the blue one is H_1, and the red one is H_2. As you can see, H_2 isn’t even in the graph when n  is small. At that point, H_0 and H_1are competing amongst themselves, and H_2 is nowhere to be seen. As we observe more successes, however, H_2 gets more and more confirmation. Eventually, H_2 becomes more likely than H_0. At that point, suddenly the blue line, which describes H_1, changes slope and starts losing probability.

Why does that happen? Because, as I suggested earlier, the hypotheses don’t really compete equally and fairly amongst each other. Rather, they’re mostly competing with the most likely hypothesis other than themselves. When H_2 becomes more likely than H_0, H_1 starts competing with that hypothesis instead, and starts losing. However, H_0 is still only competing with H_1 because that’s still the most likely hypothesis. When H_2 becomes more likely than H_1, however, then even H_0 starts competing with it, because H_2 became the most likely hypothesis that’s not itself.

And this agrees nicely with our intuition, too! When you’re collecting evidence and trying to prove or falsify a hypothesis, you don’t shoot for the underdog; you try to take down the leading one! People are trying to replace Quantum Mechanics and General Relativity, not Newtonian mechanics. Newton has lost, and sure, maybe by prospect Theory of Everything does better than Newton did, but who cares, that’s no longer the relevant theory.

As a final example, let’s examine the four hypothesis case, where p_3=0.7, P(H_3|X)=10^{-3}, and N=100. The yellow line is H_0, the red one is H_1, the light blue is H_2, and the dark blue is the new H_3.


As you can see, at any given point, hypotheses only compete with their strongest opponent, and as soon as some other opponent takes the lead, everyone feels it. The strength of a hypothesis is in what it prohibits, not in what it allows, and the winning hypothesis is the one that will beat the other ones to death.

This entry was posted in Mathematics, Probability Theory, Rationality and tagged , , , , , . Bookmark the permalink.

4 Responses to Bayesian falsification and the strength of a hypothesis

  1. Pingback: Orthodox test statistics and the absence of alternatives | An Aspiring Rationalist's Ramble

  2. Pingback: Absence of evidence is evidence of absence | An Aspiring Rationalist's Ramble

  3. Pingback: Truth, Probability, and Unachievable Consistency | An Aspiring Rationalist's Ramble

  4. Pingback: Stopping rules, p-values, and the likelihood principle | An Aspiring Rationalist's Ramble

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s