Absence of evidence is evidence of absence

The W’s article about Evidence of Absence is confusing. They have an anecdote:

A simple example of evidence of absence: A baker never fails to put finished pies on her windowsill, so if there is no pie on the windowsill, then no finished pies exist. This can be formulated as modus tollens in propositional logic: P implies Q, but Q is false, therefore P is false.

But then go on to say: “Per the traditional aphorism, ‘absence of evidence is not evidence of absence’, positive evidence of this kind is distinct from a lack of evidence or ignorance[1] of that which should have been found already, had it existed.[2]

And at this point I go all ?????.

And then they continue with an Irving Copi quote: “In some circumstances it can be safely assumed that if a certain event had occurred, evidence of it could be discovered by qualified investigators. In such circumstances it is perfectly reasonable to take the absence of proof of its occurrence as positive proof of its non-occurrence.”

UM.

Alright so, trying to untangle this mess, they seem to want to make a qualitative distinction between “high-expectation evidence” and “low-expectation evidence.” Now, if you have read other stuff on this blog, like stuff about Bayes’ Theorem and the Bayesian definition of evidence and the many ways to look at probability and… Well, you must know by now that probability theory has no qualitative distinctions. Everything is quantitative. Any sharp divisions are strictly ad hoc and arbitrary and not natural clusters of conceptspace.

Thankfully, there is another quote in that W article that’s closer to the mark:

If someone were to assert that there is an elephant on the quad, then the failure to observe an elephant there would be good reason to think that there is no elephant there. But if someone were to assert that there is a flea on the quad, then one’s failure to observe it there would not constitute good evidence that there is no flea on the quad. The salient difference between these two cases is that in the one, but not the other, we should expect to see some evidence of the entity if in fact it existed. Moreover, the justification conferred in such cases will be proportional to the ratio between the amount of evidence that we do have and the amount that we should expect to have if the entity existed. If the ratio is small, then little justification is conferred on the belief that the entity does not exist. [For example] in the absence of evidence rendering the existence of some entity probable, we are justified in believing that it does not exist, provided that (1) it is not something that might leave no traces and (2) we have comprehensively surveyed the area where the evidence would be found if the entity existed…[5]

—J.P. Moreland and W.L. Craig, Philosophical Foundations for a Christian Worldview

This looks much more like Bayesian reasoning than the rest of that article did. But let’s delve deeper and see how to prove a negative.

Continue reading

Posted in Basic Rationality, Mathematics, Probability Theory, Rationality | Tagged , , , , , , , | Leave a comment

Beliefs and aliefs

What does it mean to believe?

This is not supposed to be some Deeply Wise prod to make someone write philosophical accounts of the mystical uniqueness of human consciousness or some such. It’s an actual question about the actual meaning of the actual word. Not that words have intrinsic meanings, of course, but what do we mean when we use this word?

And like many good words in the English language, it has a lot of meanings.

LessWrong has a lot of talk about this. Amongst the meanings of the verb “to believe” talked about in the linked Sequence are to anticipate an experience, to anticipate anticipating an experience, to cheer for a team, and to signal group membership. And of course, that’s not all. Some people in the atheist movement, for instance, use the word “belief” sometimes to refer to unjustified or faith-based models-of-the-world.

Now, there is a very interesting other word in philosophy and psychology: “alief.” To alieve something is to have a deep, instinctual, subconscious belief, and the word is used especially when this subconscious feeling is at odds with the conscious mind. The W uses a few examples to explain the concept, like the person who is standing on a transparent balcony and, in spite of believing themself safe, alieves the danger of falling.

This is a very interesting (and fairly obvious, after you grok the difference between your Systems 1 and 2) internal dichotomy. Ideally, we want our beliefs and aliefs to be identical, and whenever we change our beliefs we’d like to likewise change our aliefs. And I think much of what Yudkowsky means when he talks about making beliefs pay rent refers exactly to this concept, turning beliefs into aliefs. This would seem to be very useful for rationality in general – a large part of rationality techniques consists of a bunch of heuristics for turning conscious deliberations into intuitive judgements. And of course, it’s very hard to do.

Pascal’s Wager (the one that says that, on the off-chance that god does in fact exist and will punish you for not believing, you should believe in it) has lots of flaws in it, but I think this is a particularly severe one. Sure, maybe the human brain is absolutely and completely insane in how it translates beliefs into aliefs and vice-versa, but it seems to me that, most of the time, you can’t just, by an effort of will, force it to turn a belief into an alief. And Pascal himself admitted this, and said that what the rational person should do is act and behave as if they believed until they actually did. And I’m sure that would work with some people, eventually, in the sense that they’d believe they believe, they’d profess and cheer and wear their belief.

But I’ll be damned if any amount of praying will actually convince me, on the brink of death, that I’m about to meet the Creator.

Or some such, depending on which religion you’re talking about.

And one would think maybe a just god would reward honesty more than barefaced self-manipulation.

Whichever the case, you can’t just choose to anticipate experiences: either you do, or you don’t, for good or for ill. And the brain isn’t completely stupid, if it didn’t move somewhat according to evidence it would’ve been selected out of the gene pool a long time ago, but it’s not terribly efficient or smart about it, and its belief → alief translation procedure can be overriden by a lot of other modules, or twisted and hacked into unrecognisability. But it seems that, in general, a lot of rationality heuristics boil down to: okay, this is the normatively correct way to think – how do I internalise it?

I don’t know. It appears to take lots of practice or some such, and different kinds of belief require different kinds of alief-generating, and some people seem to be naturally better than others at this “taking ideas seriously” skill. But we all know that the whole of rationality isn’t limited to what Less Wrong has to offer, and as further research is done, well, I’d be eager to learn how to more efficiently internalise my beliefs.

Posted in Basic Rationality, Rationality | Tagged , , | Leave a comment

Learning Bayes [part 1]

I have a confession to make.

I don’t actually know Bayesian statistics.

Or, any statistics at all, really.

Shocking, I know. But hear me out.

What I know is… Bayesian theory. I can derive Bayes’ Theorem, and I also can probably derive most results from it. I’m a good mathematician. But I haven’t actually spent any time doing practical Bayesian statistics stuff. Very often a friend, like raginrayguns, tells me about a thing, a super cool thing, or asks me a thing, a super confusing thing, and it sort of goes over my head and I have to really do stuff by scratch to figure out my way. I don’t have the heuristics, I don’t have all the techniques of problem-solving perfectly available to me.

For example, earlier today another friend of mine came up to me and asked me a question about the difference between Bayesian and frequentist statistics. Basically, he has a bunch of data about a lot of bridges, and then four pieces of information about each of them: cost, material, length, and astrological sign of the designer of the bridge. He wanted – I had to ask a lot of questions to figure this out because, as I said, I don’t do statistics yet, I don’t know the jargon – to find the posterior distribution for the cost of his projected bridge, given a material, a length, and his astrological sign. Or rather, he wanted the Bayesian answer, because he knew the frequentist one already.

Let me pause this a bit, and talk about another problem.

Continue reading

Posted in Mathematics, Probability Theory | Tagged , , , , , , , , , | 2 Comments

Bayesian falsification and the strength of a hypothesis

At the end of my post about other ways of looking at probability, I showed you a graph of evidence against probability. This is the relevant graph:

Looking at this graph was one of the most useful things I’ve ever done as a Bayesian. It shows, as I explained, exactly where most of the difficulty is in proving things, in coming up with hypotheses, etc. Another interesting aspect is the symmetry, showing that, to a Bayesian, confirmation and disconfirmation are pretty much the same thing. Probability doesn’t have a prejudice for or against any hypothesis, you just gather evidence and slide along the graph. Naïvely, the concept of Popper falsification doesn’t look terribly useful or relevant or particularly true.

So whence comes the success of the idea?

Continue reading

Posted in Mathematics, Probability Theory, Rationality | Tagged , , , , , | 2 Comments

How and when to respect authority

When I discussed the usefulness (or lack thereof) of Aumann’s Agreement Theorem, I mentioned that the next best thing to sharing the actual knowledge you gathered (or mind melding) was sharing likelihood ratios.

But sometimes… you can’t. Well, most of the time, really. Or all the time. Humans do not actually have little magical plausibility fluids in their heads that flow between hypotheses and are kept track of dutifully by some internal Probability Inspector, just like humans do not actually have utility functions. If a Bayesian tells you that they believe a thing with 40% probability… either they’re crazy, or they’re Omega, or they’re giving you a ballpark estimate of subjective feelings of uncertainty.

And then there’s the time when your fellow rationalist… is not actually someone you know. They might be a friend of a friend, or a famous scientist, or just the abstract entity of Science.

Continue reading

Posted in Basic Rationality, Rationality | Tagged , , , , , | Leave a comment

How to prove stuff

A while ago, I wrote up a post that explained what a mathematical proof is. In short, a mathematical proof is a bunch of sentences that follow from other sentences. And when mathematicians have been trying to prove stuff for hundreds of years, well, we’re bound to get fairly good at it. And to develop techniques.

So, then. Given any theory (that is, a set of logical sentences) \mathcal T , when a sentence S is a theorem (that is, it can be proven from the theory), we write \mathcal T\vdash S. And if we want to prove a thing, it may not be the case that we actually know that the thing is true. Sometimes we just have an intuition that it may be true. Or maybe we know it’s true because some other mathematician has told us it’s true, but we don’t see how. So we need to find a way to do it.

Continue reading

Posted in Intuitive Mathematics, Logic, Mathematics | Tagged , , , | Leave a comment

Agreements, disagreements, and likelihood ratios

The LessWrong community has, as a sort of deeply ingrained instinct/rule, that we should never “agree to disagree” about factual matters. The map is not the territory, and if we disagree about the territory, that means at least one of our maps is incorrect. You will also see us citing this mysterious “Aumann’s Agreement Theorem.”

I wish to explain this. Aumann’s theorem says, broadly speaking, that two rational agents that share the same priors and whose posteriors are common knowledge cannot agree to disagree on any factual matter. You’ll notice that I ominously italicised the words “common knowledge.” This has a good reason to be.

Common knowledge is a much stricter condition than it sounds. Suppose you and I are reasoning about some proposition A. My background knowledge is given by \mathcal X and yours is given by \mathcal Y , and we have that P(A|\mathcal X) = p and P(A|\mathcal Y) = q . A proposition C is called common knowledge of two agents with respect to some other proposition A if:

  1. C implies that you and I both know C.
  2. I would have assigned probability p to A no matter what I saw in addition to C.
  3. You would have assigned probability q to A no matter what you saw in addition to C.

…this doesn’t sound very useful. When it’s put that way, it’s pretty clear that the theorem is true. I mean, that’s basically saying that, for any proposition E \in \mathcal X , I would have that P(A|CE) = P(A|C) = p and something similar would go for you. Since we’re both rational agents, that’d mean that p = q = P(A|C) .

Continue reading

Posted in Basic Rationality, Mathematics, Probability Theory, Rationality | Tagged , , , | 1 Comment

Bayes’ Theorem

Bayes’ Theorem has many, many introductions online already. Those show the intuition behind using the theorem. This is going to be a step-by-step mathematical derivation of the theorem, as Jaynes explained it in his book Probability Theory: The Logic of Science. However, he himself has skipped a bunch of steps, and not always made his reasoning as clear as possible, so what I’m going to do here is elaborate, expand, and explain his steps.

The maths can be quite complex, but I think anyone can follow the ideas. But still, maths cw! So, let’s go, shall we?

The Desiderata

What we want is to create a way to measure our uncertainty of propositions. Or rather, to measure their plausibility. We want to know exactly how sure we are that something is true. We won’t give many constraints, though. We’re trying to be minimal in our axioms here. So the first desideratum is

  • Degrees of plausibility are represented by real numbers.

We’re not going to say anything about any upper or lower bounds. We don’t know yet which real numbers should represent certainty. All we know is that real numbers must be used to represent our plausibility.

We will adopt a convention that says that a greater plausibility will be represented by a greater number. This isn’t necessary, of course, but it’s easier on the eye. We shall also suppose continuity, which is to say that infinitesimal increases in our plausibility should yield infinitesimal increases in its number.

And even this axiom is not incredibly intuitive. Sometimes you don’t even have any idea of how plausible a thing is. However, we’re trying to design an optimal method of reasoning, and I think this is a reasonable thing to expect. You frequently have to make decisions based on incomplete information, and there is some meaningful sense in which you think some states-of-the-world are more or less plausible than others, more or less likely to happen. So it’s that meaningful sense we’re trying to capture here.

  • Qualitative correspondence with common sense

This is a sort of catch-all axiom, and it’s very important. This axiom is very important. It’s the axiom that says that the meaning of (A|B) is “the plausibility of A, given that B is true.” The argument this proof will try to make is that certain things are desirable of an agent’s reasoning process, and that at the end, we’ll arrive at certain rules. Even if these desirable things can have multiple interpretations, we’re taking a diachronic interpretation – one that says that the proper meaning of things like (A|B)  is that B was observed in the past, so after this happened, the new plausibility for A is that. Under the diachronic interpretation, we do prove those rules, which means that violating those rules implies that you violated some of the desiderata.

For instance, we should expect that if we observe evidence in favour of something, that something should be more plausible, and vice-versa; that is, if I have that some event B makes A more likely, then my plausibility (A|B) > (A) and also (\bar A|B) < (\bar A) .

So the way this axiom works is: sometimes I will invoke it, and justify it on something one would expect to be fairly reasonable diachronic assumptions for belief-updating. At the end, I will show that these assumptions pin the rules of probability down uniquely, and any agent that reasons in a way that’s not isomorphic to these rules will therefore necessarily be violating at least one of these assumptions.

  • Consistency

This is in fact a stronger claim than it looks. This system of measuring probability will have a bunch of properties which we label collectively “consistency,” namely the fact that two ways of arriving at a result should give the same result, every bit of information should be taken into account, and equivalent states of knowledge are represented by the same numbers.

An important point about this is that this assumption is about states of knowledge and not logical status. It may very well be that two propositions are logically equivalent or otherwise connected, but an agent is only constrained by that if they know about this logical link (as I discuss here and here).

And now, believe it or not… we’re done. This is enough for us to find Bayes’ Theorem.

Continue reading

Posted in Mathematics, Probability Theory, Rationality | Tagged , , , , , , | 2 Comments

MIRI paper on logical uncertainty

Talking to raginrayguns again and he mentioned that a month and a half ago, Paul Christiano wrote a paper exactly on the subject of logical uncertainty. While I haven’t finished reading it yet, I’ll publish it here because it’s relevant.

Here you go.

Posted in Logic, Mathematics, Probability Theory, Rationality | Tagged , , , , , , , | Leave a comment

Logical Uncertainty: an addendum

And I forgot to mention one thing in the last post which is relevant. Gaifman, in his paper, states that if in \mathcal A  we have that A\rightarrow B then P(B|\mathcal A) \geq P(A|\mathcal A). I’ll quickly show that that’s a theorem of my approach, and, indeed, any similar approach.

Suppose that ``A\rightarrow B" \in X. In that case, then, my approach has that P(B|AX) = 1, because the agent knows B is logically implied by A. If that’s the case, then:

\begin{aligned} P(B|X) &= P(B|AX)P(A|X)+P(B|\bar AX)P(\bar A|X) \\  &=P(A|X)+P(B|\bar AX)P(\bar A|X)\\  &\geq P(A|X)\end{aligned}

With equality if and only if either P(A|X) = 1 (A is logically certain given X) or P(B|\bar AX) = 0 (B is also impossible when A is false, which means it’s logically equivalent to A). So maybe this was fairly obvious to you, but if it wasn’t, now you have that proof in your background list of proofs and theorems!

Posted in Logic, Mathematics, Probability Theory, Rationality | Tagged , , , , , , | 2 Comments