## The use of Less Wrong

I’ve been planning on writing a post along these lines, and the recent thing on tumblr about the LW community has given me just the right motivation and environment for it. Specifically this nostalgebraist post gave me the inspiration I needed. He described the belief-content of LW as either obvious, false, or benign self-help advice one can find in many other places.

Now, nostalgebraist isn’t a LWer. I am. So let me say what the belief-content of LW looks like, to me. Why do I think LW-type “rationality” is useful? What’s the use of it all? Is it just the norms of discourse?

And of course you have to take this with a grain of salt. I’m a LWer. So I’m severely biased in favour of it, compared to baseline. And even nostalgebraist is pretty warm towards the community, or at least the tumblr community, so even his opinion is somewhat closer to being positive than baseline. To properly avoid Confirmation, opinions of people who have had bad experiences with LW should be sought. I’ve seen quite a few on tumblr too, but none really outside of tumblr so there’s also the set of biases that come from there. This paragraph is supposed to be your disclaimer: I’m not an objective outside observer. This is the view from the inside, rather, why I personally think LW is useful, and why I (partially) disagree with nostalgebraist.

I think my first problem is: nostalgebraist is smart. And he’s got a certain kind of smarts, one that I find with some frequency in LW, that makes him say stuff like “‘many philosophical debates are the results of disagreements over semantics’ — yeah, we know.” The first point is: we don’t. I don’t know if I’m too used to dealing with people outside of LW, or if he’s too used to dealing with people around as smart as he is, but this sort of thing is not, in fact, obvious. Points like “don’t argue over words” and “the map is not the territory” and “if you don’t consciously watch yourself you will likely suffer from these biases” aren’t obvious! Most people don’t get them! I didn’t get them before I read LW, the vast majority of people I meet (from one of the 100 best engineering schools in the world) don’t know this!

LW-type “insights” are not, in fact, obvious to most people. Most people – and yes I’m including academics, scientists, mathematicians, whatever, people traditionally considered intelligent – do in fact spend most of their lives ignoring this completely. So I’ll get back to what exactly those insights may be later.

The second problem is… I also think he’s objectively wrong about what beliefs are actually common amongst LWers. Just take a look at the 2013 LW Survey Results. In fact, the website itself barely talks about FAI, so I don’t understand where the idea that Singularity-type beliefs are widespread comes from. Maybe it’s because everyone outside of LW doesn’t talk at all about FAI and Singularity and we talk a little about it? I dunno, my personal experience with LW is that much less than 0.5% of the time we spent talking is dedicated to this kind of discussion, and even belief in Singularity/FAI is oftentimes permeated with qualifiers and ifs and buts. And even the hardcore Bayesian thing isn’t all that settled either.

At any rate, there’s much more to it than just that.

## Absence of evidence is evidence of absence

The W’s article about Evidence of Absence is confusing. They have an anecdote:

A simple example of evidence of absence: A baker never fails to put finished pies on her windowsill, so if there is no pie on the windowsill, then no finished pies exist. This can be formulated as modus tollens in propositional logic: P implies Q, but Q is false, therefore P is false.

But then go on to say: “Per the traditional aphorism, ‘absence of evidence is not evidence of absence’, positive evidence of this kind is distinct from a lack of evidence or ignorance[1] of that which should have been found already, had it existed.[2]

And at this point I go all ?????.

And then they continue with an Irving Copi quote: “In some circumstances it can be safely assumed that if a certain event had occurred, evidence of it could be discovered by qualified investigators. In such circumstances it is perfectly reasonable to take the absence of proof of its occurrence as positive proof of its non-occurrence.”

UM.

Alright so, trying to untangle this mess, they seem to want to make a qualitative distinction between “high-expectation evidence” and “low-expectation evidence.” Now, if you have read other stuff on this blog, like stuff about Bayes’ Theorem and the Bayesian definition of evidence and the many ways to look at probability and… Well, you must know by now that probability theory has no qualitative distinctions. Everything is quantitative. Any sharp divisions are strictly ad hoc and arbitrary and not natural clusters of conceptspace.

Thankfully, there is another quote in that W article that’s closer to the mark:

If someone were to assert that there is an elephant on the quad, then the failure to observe an elephant there would be good reason to think that there is no elephant there. But if someone were to assert that there is a flea on the quad, then one’s failure to observe it there would not constitute good evidence that there is no flea on the quad. The salient difference between these two cases is that in the one, but not the other, we should expect to see some evidence of the entity if in fact it existed. Moreover, the justification conferred in such cases will be proportional to the ratio between the amount of evidence that we do have and the amount that we should expect to have if the entity existed. If the ratio is small, then little justification is conferred on the belief that the entity does not exist. [For example] in the absence of evidence rendering the existence of some entity probable, we are justified in believing that it does not exist, provided that (1) it is not something that might leave no traces and (2) we have comprehensively surveyed the area where the evidence would be found if the entity existed…[5]
—J.P. Moreland and W.L. Craig, Philosophical Foundations for a Christian Worldview

This looks much more like Bayesian reasoning than the rest of that article did. But let’s delve deeper and see how to prove a negative.

## Beliefs and aliefs

What does it mean to believe?

This is not supposed to be some Deeply Wise prod to make someone write philosophical accounts of the mystical uniqueness of human consciousness or some such. It’s an actual question about the actual meaning of the actual word. Not that words have intrinsic meanings, of course, but what do we mean when we use this word?

And like many good words in the English language, it has a lot of meanings.

LessWrong has a lot of talk about this. Amongst the meanings of the verb “to believe” talked about in the linked Sequence are to anticipate an experience, to anticipate anticipating an experience, to cheer for a team, and to signal group membership. And of course, that’s not all. Some people in the atheist movement, for instance, use the word “belief” sometimes to refer to unjustified or faith-based models-of-the-world.

Now, there is a very interesting other word in philosophy and psychology: “alief.” To alieve something is to have a deep, instinctual, subconscious belief, and the word is used especially when this subconscious feeling is at odds with the conscious mind. The W uses a few examples to explain the concept, like the person who is standing on a transparent balcony and, in spite of believing themself safe, alieves the danger of falling.

This is a very interesting (and fairly obvious, after you grok the difference between your Systems 1 and 2) internal dichotomy. Ideally, we want our beliefs and aliefs to be identical, and whenever we change our beliefs we’d like to likewise change our aliefs. And I think much of what Yudkowsky means when he talks about making beliefs pay rent refers exactly to this concept, turning beliefs into aliefs. This would seem to be very useful for rationality in general – a large part of rationality techniques consists of a bunch of heuristics for turning conscious deliberations into intuitive judgements. And of course, it’s very hard to do.

Pascal’s Wager (the one that says that, on the off-chance that god does in fact exist and will punish you for not believing, you should believe in it) has lots of flaws in it, but I think this is a particularly severe one. Sure, maybe the human brain is absolutely and completely insane in how it translates beliefs into aliefs and vice-versa, but it seems to me that, most of the time, you can’t just, by an effort of will, force it to turn a belief into an alief. And Pascal himself admitted this, and said that what the rational person should do is act and behave as if they believed until they actually did. And I’m sure that would work with some people, eventually, in the sense that they’d believe they believe, they’d profess and cheer and wear their belief.

But I’ll be damned if any amount of praying will actually convince me, on the brink of death, that I’m about to meet the Creator.

Or some such, depending on which religion you’re talking about.

And one would think maybe a just god would reward honesty more than barefaced self-manipulation.

Whichever the case, you can’t just choose to anticipate experiences: either you do, or you don’t, for good or for ill. And the brain isn’t completely stupid, if it didn’t move somewhat according to evidence it would’ve been selected out of the gene pool a long time ago, but it’s not terribly efficient or smart about it, and its belief → alief translation procedure can be overriden by a lot of other modules, or twisted and hacked into unrecognisability. But it seems that, in general, a lot of rationality heuristics boil down to: okay, this is the normatively correct way to think – how do I internalise it?

I don’t know. It appears to take lots of practice or some such, and different kinds of belief require different kinds of alief-generating, and some people seem to be naturally better than others at this “taking ideas seriously” skill. But we all know that the whole of rationality isn’t limited to what Less Wrong has to offer, and as further research is done, well, I’d be eager to learn how to more efficiently internalise my beliefs.

## Learning Bayes [part 1]

I have a confession to make.

I don’t actually know Bayesian statistics.

Or, any statistics at all, really.

Shocking, I know. But hear me out.

What I know is… Bayesian theory. I can derive Bayes’ Theorem, and I also can probably derive most results from it. I’m a good mathematician. But I haven’t actually spent any time doing practical Bayesian statistics stuff. Very often a friend, like raginrayguns, tells me about a thing, a super cool thing, or asks me a thing, a super confusing thing, and it sort of goes over my head and I have to really do stuff from scratch to figure out my way. I don’t have the heuristics, I don’t have all the techniques of problem-solving perfectly available to me.

For example, earlier today another friend of mine came up to me and asked me a question about the difference between Bayesian and frequentist statistics. Basically, he has a bunch of data about a lot of bridges, and then four pieces of information about each of them: cost, material, length, and astrological sign of the designer of the bridge. He wanted – I had to ask a lot of questions to figure this out because, as I said, I don’t do statistics yet, I don’t know the jargon – to find the posterior distribution for the cost of his projected bridge, given a material, a length, and his astrological sign. Or rather, he wanted the Bayesian answer, because he knew the frequentist one already.

Let me pause this a bit, and talk about another problem.

## Bayesian falsification and the strength of a hypothesis

At the end of my post about other ways of looking at probability, I showed you a graph of evidence against probability. This is the relevant graph:

Looking at this graph was one of the most useful things I’ve ever done as a Bayesian. It shows, as I explained, exactly where most of the difficulty is in proving things, in coming up with hypotheses, etc. Another interesting aspect is the symmetry, showing that, to a Bayesian, confirmation and disconfirmation are pretty much the same thing. Probability doesn’t have a prejudice for or against any hypothesis, you just gather evidence and slide along the graph. Naïvely, the concept of Popper falsification doesn’t look terribly useful or relevant or particularly true.

So whence comes the success of the idea?

Posted in Mathematics, Probability Theory, Rationality | | 2 Comments

## How and when to respect authority

When I discussed the usefulness (or lack thereof) of Aumann’s Agreement Theorem, I mentioned that the next best thing to sharing the actual knowledge you gathered (or mind melding) was sharing likelihood ratios.

But sometimes… you can’t. Well, most of the time, really. Or all the time. Humans do not actually have little magical plausibility fluids in their heads that flow between hypotheses and are kept track of dutifully by some internal Probability Inspector, just like humans do not actually have utility functions. If a Bayesian tells you that they believe a thing with 40% probability… either they’re crazy, or they’re Omega, or they’re giving you a ballpark estimate of subjective feelings of uncertainty.

And then there’s the time when your fellow rationalist… is not actually someone you know. They might be a friend of a friend, or a famous scientist, or just the abstract entity of Science.

## How to prove stuff

A while ago, I wrote up a post that explained what a mathematical proof is. In short, a mathematical proof is a bunch of sentences that follow from other sentences. And when mathematicians have been trying to prove stuff for hundreds of years, well, we’re bound to get fairly good at it. And to develop techniques.

So, then. Given any theory (that is, a set of logical sentences) $\mathcal T$, when a sentence S is a theorem (that is, it can be proven from the theory), we write $\mathcal T\vdash S$. And if we want to prove a thing, it may not be the case that we actually know that the thing is true. Sometimes we just have an intuition that it may be true. Or maybe we know it’s true because some other mathematician has told us it’s true, but we don’t see how. So we need to find a way to do it.

## Agreements, disagreements, and likelihood ratios

The LessWrong community has, as a sort of deeply ingrained instinct/rule, that we should never “agree to disagree” about factual matters. The map is not the territory, and if we disagree about the territory, that means at least one of our maps is incorrect. You will also see us citing this mysterious “Aumann’s Agreement Theorem.”

I wish to explain this. Aumann’s theorem says, broadly speaking, that two rational agents that share the same priors and whose posteriors are common knowledge cannot agree to disagree on any factual matter. You’ll notice that I ominously italicised the words “common knowledge.” This has a good reason to be.

Common knowledge is a much stricter condition than it sounds. Suppose you and I are reasoning about some proposition A. My background knowledge is given by $\mathcal X$ and yours is given by $\mathcal Y$, and we have that $P(A|\mathcal X) = p$ and $P(A|\mathcal Y) = q$. A proposition C is called common knowledge of two agents with respect to some other proposition A if:

1. C implies that you and I both know C.
2. I would have assigned probability $p$ to A no matter what I saw in addition to C.
3. You would have assigned probability $q$ to A no matter what you saw in addition to C.

…this doesn’t sound very useful. When it’s put that way, it’s pretty clear that the theorem is true. I mean, that’s basically saying that, for any proposition $E \in \mathcal X$, I would have that $P(A|CE) = P(A|C) = p$ and something similar would go for you. Since we’re both rational agents, that’d mean that $p = q = P(A|C)$.

| | 1 Comment

## Bayes’ Theorem

Bayes’ Theorem has many, many introductions online already. Those show the intuition behind using the theorem. This is going to be a step-by-step mathematical derivation of the theorem, as Jaynes explained it in his book Probability Theory: The Logic of Science. However, he himself has skipped a bunch of steps, and not always made his reasoning as clear as possible, so what I’m going to do here is elaborate, expand, and explain his steps.

The maths can be quite complex, but I think anyone can follow the ideas. But still, maths cw! So, let’s go, shall we?

The Desiderata

What we want is to create a way to measure our uncertainty of propositions. Or rather, to measure their plausibility. We want to know exactly how sure we are that something is true. We won’t give many constraints, though. We’re trying to be minimal in our axioms here. So the first desideratum is

• Degrees of plausibility are represented by real numbers.

We’re not going to say anything about any upper or lower bounds. We don’t know yet which real numbers should represent certainty. All we know is that real numbers must be used to represent our plausibility.

We will adopt a convention that says that a greater plausibility will be represented by a greater number. This isn’t necessary, of course, but it’s easier on the eye. We shall also suppose continuity, which is to say that infinitesimal increases in our plausibility should yield infinitesimal increases in its number.

And even this axiom is not incredibly intuitive. Sometimes you don’t even have any idea of how plausible a thing is. However, we’re trying to design an optimal method of reasoning, and I think this is a reasonable thing to expect. You frequently have to make decisions based on incomplete information, and there is some meaningful sense in which you think some states-of-the-world are more or less plausible than others, more or less likely to happen. So it’s that meaningful sense we’re trying to capture here.

• Qualitative correspondence with common sense

This is a sort of catch-all axiom, and it’s very important. This axiom is very important. It’s the axiom that says that the meaning of $(A|B)$ is “the plausibility of A, given that B is true.” The argument this proof will try to make is that certain things are desirable of an agent’s reasoning process, and that at the end, we’ll arrive at certain rules. Even if these desirable things can have multiple interpretations, we’re taking one that says that the proper meaning of things like $(A|B)$ is that you have knowledge that B was observed, so conditional on that knowledge, the plausibility for A is that. Under this interpretation, we do prove the reasoning rules, which means that violating those rules implies that you violated some of the desiderata.

For instance, we should expect that if we observe evidence in favour of something, that something should be more plausible, and vice-versa; that is, if I have that some event B makes A more likely, then my plausibility $(A|B) > (A)$ and also $(\bar A|B) < (\bar A)$.

So the way this axiom works is: sometimes I will invoke it, and justify it on something one would expect to be fairly reasonable assumptions for belief-updating. At the end, I will show that these assumptions pin the rules of probability down uniquely, and any agent that reasons in a way that’s not isomorphic to these rules will therefore necessarily be violating at least one of these assumptions.

An interesting feature of these desiderata is that time isn’t mentioned anywhere in them. And it shouldn’t! Your reasoning has to be time-independent and in a certain sense objective, and time doesn’t get in at all. These rules are about states of uncertainty conditional on knowledge, and thus your reasoning depends exclusively on your knowledge itself and not on when it was obtained.

• Consistency

This is in fact a stronger claim than it looks. This system of measuring probability will have a bunch of properties which we label collectively “consistency,” namely the fact that two ways of arriving at a result should give the same result, every bit of information should be taken into account, and equivalent states of knowledge are represented by the same numbers.

An important point about this is that this assumption is about states of knowledge and not logical status. It may very well be that two propositions are logically equivalent or otherwise connected, but an agent is only constrained by that if they know about this logical link (as I discuss here and here).

And now, believe it or not… we’re done. This is enough for us to find Bayes’ Theorem.

## MIRI paper on logical uncertainty

Talking to raginrayguns again and he mentioned that a month and a half ago, Paul Christiano wrote a paper exactly on the subject of logical uncertainty. While I haven’t finished reading it yet, I’ll publish it here because it’s relevant.

Here you go.