Alieving Rationality

Almost six years ago, Scott wrote a post on LW about the apparent lack of strong correlation of real-world instrumental success and studying what he calls “x-rationality” – that is, OB/LW-style rationality, of the kind that’s above and beyond the regular skills you can get from being a generally intelligent, thoughtful, and scientifically-minded person.

I’d be quite interested in hearing what his opinion is six years into the future, but my current one is that this situation hasn’t changed much, in general. In fact, I was linked to his post by a recent reply Ozy sent someone on zir blog, while commenting that zie didn’t spread LW memes because zie didn’t feel they were very useful. I’m not alone in this, then. (Let’s remember that CFAR exists now, though.)

I’d like to share my thoughts on another potential factor contributing to this case, something that was alluded to by the post and many commenters of it (including himself and Anna Salamon), something I’ve noticed that… I do. A skill, maybe.

Aspiring x-rationalists are the people who look at the mountain of stuff on Ovecoming Bias, Less Wrong, and other such sources, and decide that it makes sense, that their lives would be improved by the application of these techniques, so they go on and learn everything about it. They memorise it, they absorb all these memes to the point of being able to recite by heart many of the more famous quotes. And yet there isn’t a strong correlation! We’re not producing superheroes every other Tuesday! What gives?

I’d say it’s that believing rationality and alieving rationality are really different things.

Continue reading

Posted in Basic Rationality, Rationality | Tagged , | Leave a comment

On Arrogance

arrogant
adjective
having or revealing an exaggerated sense of one’s own importance or abilities.

A friend of mine once mentioned on a comment written in response to some post or another in a facebook debate group that he had knowledge of maths far above the Brazilian average. That is a simple factual sentence, a true statement (which isn’t exactly surprising given what the Brazilian average actually is). The next few comments called him arrogant.

(ETA: This is an even better example of what I’m talking about here.)

I wonder what goes on in people’s heads when they say something like that. And by “wonder” I mean “sigh exasperatedly at the silliness of rules of etiquette.”

It’s clear, if you look at society and people in general, that people do not like feeling inferior. Not only that, people dislike feeling inferior so much that it’s become a generalised heuristic not to show superiority in any aspect. It’s rude to be seen as better than anyone at anything. It will give you trouble in most social circles. That can probably be easily explained: if you’re superior at something, everyone feels jealous, and stops helping you socially, so you end up being worse off than if you were just average.

It’s okay to want to be better than yourself. But being better than other people? You have to be more humble! How can you possibly think you could actually be better than other people?? That’s incredibly arrogant of you!

Yudkowsky makes a distinction between humility and social modesty: the latter is the kind of social thing you have to show, the “don’t-stick-out” heuristic; the former is actual, real, rational humility, the kind that recognises exactly how sure they are about the outcome of a decision and what steps must be taken in order to minimise the possibility of disaster.

So people calling you arrogant is frequently, in fact, a motte-and-bailey argument. The definition I presented at the top, of a false belief in one’s superiority (or even just a belief in one’s “general superiority” as if that existed), that’s the motte. The bailey is expressing superior aptitude at anything at all without paying your due to social modesty; it’s acknowledging your skills when they’re actually good. How dare you claim you’re better than anyone else? You’re just as flawed and imperfect as all of us! Even if you’re not. You have to pretend you are, just to not commit social suicide.

What I usually say is this: it’s not arrogance if it’s true.

Continue reading

Posted in Basic Rationality, Rationality | Tagged , , , , | Leave a comment

On Magical Universes

Any sufficiently advanced technology is indistinguishable from magic.
— Sir Arthur C. Clarke (1917 – 2008)

The above quote is quite famous, at least amongst certain types of people. And the core idea is a pretty idealistic and hopeful one: technology will one day get so advanced that it will look like magic.

Or maybe it’s actually quite realistic, under another lens. If you brought a peasant from the Middle Ages to the present and showed them fast-moving gigantic flying metal contraptions, thin screens that show people on the other side of the world, and little gadgets that let you scry the past and communicate with your loved ones no matter where they are, the peasant would run away screeching: “WITCHCRAFT!” They wouldn’t run very far, they’d probably be hit by a car, but they’d run alright.

Sufficiently analysed magic is indistinguishable from science (warning: TV Tropes link). This sentence is similar to the quote starting the post, but it’s not nearly as deep or meaningful. Science is, after all, just the method. If a thing exists, then it falls under the scope of science. So if magic exists and works then it can be science’d. Let’s try to science it. Exactly how magical does magic have to be before it goes beyond the boundaries of what’s achievable by technology? Exactly how advanced does technology have to be before it’s far enough from our suspension of disbelief that we’re willing to call it magic?

A more practical question might be: what should you conclude about the universe once you observe magic in it?

Continue reading

Posted in Mathematics, Philosophy | Tagged , , , , , | 15 Comments

Learning Bayes [part 2]

In part 1, I talked about the Bayesian way of dealing with, well, noise, in a certain sense. How do I figure out that I “should not” conditionalise on a person’s astrological sign when predicting the cost of the bridge they’ll build, but that I “should” conditionalise on the bridge’s material without arbitrarily choosing the former to have zero influence and the latter to have “some” influence. This was brought up because a friend of mine was talking about stuff to me. And stuff.

And then that friend commented on the same post explaining that that did not quite get to the heart of what he was looking for. The best way I could find to phrase that was one of differently-parametrised Bayesian model selection. And like I said, I have no training in statistics, so I talked to my friend raginrayguns about it, and after a long time during which we both discussed this and talked and had loads of fun, we (he) sort of reached a conclusion.

Maybe.

I mean, there’s a high probability that we (he) reached a conclusion.

So, suppose we have two possible models, M_1(\textbf a) and M_2(\textbf a, \textbf b) that could explain the data, and these models have a different number of parameters. \textbf a is a vector of the parameters both models have in common, and \textbf b is the vector of the parameters that are present only in the second model.

My friend’s example is the following: he has a scatter plot of some data showing an apparently linear relationship between two variables: Y = α + βX + ε where I suppose ε is normally distributed. Upon closer inspection, however, it looks like there are actually two lines instead of only one! We had two interns, each of whom collected 50 of the samples.

So the common parameters are \textbf a = (\alpha, \beta), and the parameters only the second model has are \textbf b = (\lambda_{\alpha 1}, \lambda_{\beta 1}, \lambda_{\alpha 2}, \lambda_{\beta 2}) which we’ll call the intern effect. In that case, then, the αs and βs of the second model are going to be seen as the same α and β from the first plus this intern effect.

So, nothing changes. To figure out the posterior probability of the parameters, we’d just use Bayes’ Theorem; same goes for the posterior of the models. But the old frequentist problem of “models with more parameters always have better fit” still remains. How do we get rid of it?

The trick is not judging a model based on its best set of parameters, but rather averaging over all of them. Let’s try this. Suppose the data is represented by d. Then we want the posteriors p(M_1|d, X) and p(M_2|d, X). Or maybe we just want the posterior odds for them. Whichever may be the case, we have:

p(M_1|d, X) = \frac{p(d|M_1, X)}{P(d|X)}p(M_1|X)

And then we can find the probability of the data given a model using the law of total probability:

p(d|M_1, X) = \int p(d|\textbf a, M_1, X)p(\textbf a|M_1, X)d\textbf a

And of course, the same applies for Model 2:

p(d|M_2, X) = \int\int p(d|\textbf a, \textbf b, M_2, X)p(\textbf a|\textbf b, M_2, X)p(\textbf b|M_2, X)d\textbf ad\textbf b

And in these, p(d|\textbf a, M_1, X) and p(d|\textbf a, \textbf b, M_2, X) are just the likelihood functions of traditional statistics. Then, the posterior odds – which are in general much more useful since they just define how much the evidence supports a hypothesis when compared to another instead of in absolute terms – are given by:

O(M_1:M_2|D, X) = \mathcal{LR}(M_1:M_2;D)O(M_1:M_2|X)

Where the likelihood ratio is just:

\mathcal{LR}(M_1:M_2;D) = \frac{\int P(D|\textbf a, M_1, X)p(\textbf a|M_1, X)d\textbf a}{\int\int P(D|\textbf a, \textbf b, M_2, X)p(\textbf a|\textbf b, M_2, X)p(\textbf b|M_2, X)d\textbf ad\textbf b}

Here I’m using capitals for the probability because I’m no longer talking about the specific sampling distribution but rather its value on the observed data D.

And there’s the thing that’s used here and that never ever shows up in frequentist statistics, as usual, which is the prior distribution for… well, everything. We have the prior odds ratio between the two models, and if we’re just interested in which of the two models the data supports better, we still need the priors for the parameters themselves. And this method is, of course, more general than just two models with a core of the same parameters and where one of the models has more parameters than the other; they can have two complete different sets of parameters, and you just do that.

What would a prior for the parameters look like? Of course, it depends on the case. One would expect, perhaps, that in the linear case described by my friend, they’d possibly be normally distributed with mean 0 (no idea which variance), or something. Usually at this point we just throw our arms up and do a non-Bayesian thing: just choose something convenient. Note how I said that this is non-Bayesian. Picking and choosing priors that seem “reasonable” isn’t justifiable in the ideal case, because then the prior would be uniquely determined, but it’s the best we can do in practice, lacking perfect insight into our own states of knowledge.

Alright. So now we have the posterior probability – or at least posterior odds – for the models. So do we just pick the one with the highest posterior and go with it?

Not quite. There are a few problems with this approach. First and foremost, this is a Decision Theory problem, and unless you care only about strict accuracy, there might be value in using a model you know to be wrong because it’s not too wrong and you can still draw useful inferences from it.

For example, suppose that instead of having that the difference between the two possible lines is because of different interns, it happens because a certain effect affects two populations differently. That would mean that, in order to estimate the effect in either population, you would have access to much less data than if you bundled up both populations and pretended they were under the exact same effect. And while this might sound dishonest, the difference between the two models might be small enough that the utility of having on average twice as many points of data is larger than the disutility of using a slightly incorrect model. But of course, there is no hard-and-fast rule about how to choose, or none that’s a consensus anyway, and this is sort of an intuitive choice to balance the tradeoff between fit and amount of data.

Another problem is that, even if you’re willing to actually just pick a model and run with it, maybe there is no preferred model. Maybe the posterior odds of these two models is one, or close enough. What do you do, then? Toss a coin?

No. Bayesian statistics has a thing called a mixed model. We want to make inference, right? So that basically means getting a distribution for future data based on past data: p(d_f|D_p, X). We can, once more, use the law of total probability:

\begin{aligned}p(d_f|D_p, X) &= p(d_f|M_1, D_p, X)P(M_1|D_p, X) \\ &+ p(d_f|M_2, D_p, X)P(M_2| D_p, X) \\ &+p(d_f|\bar{M_1},\bar{M_2}, D_p, X)P(\bar{M_1}, \bar{M_2}|D_p, X) \end{aligned}

If we’re fairly confident that either model 1 or 2 are the appropriate explanations for this data, i.e. P(\bar{M_1}, \bar{M_2}|D_p,X)\approx 0,, then we can use only the two first lines. Even if we don’t, we can still have a good approximation. So we predict future data by taking a weighted average between the predictions of each model where the weight is the posterior probability of those models.

Posted in Mathematics, Probability Theory | Tagged , , , , | Leave a comment

The use of Less Wrong

I’ve been planning on writing a post along these lines, and the recent thing on tumblr about the LW community has given me just the right motivation and environment for it. Specifically this nostalgebraist post gave me the inspiration I needed. He described the belief-content of LW as either obvious, false, or benign self-help advice one can find in many other places.

Now, nostalgebraist isn’t a LWer. I am. So let me say what the belief-content of LW looks like, to me. Why do I think LW-type “rationality” is useful? What’s the use of it all? Is it just the norms of discourse?

And of course you have to take this with a grain of salt. I’m a LWer. So I’m severely biased in favour of it, compared to baseline. And even nostalgebraist is pretty warm towards the community, or at least the tumblr community, so even his opinion is somewhat closer to being positive than baseline. To properly avoid Confirmation, opinions of people who have had bad experiences with LW should be sought. I’ve seen quite a few on tumblr too, but none really outside of tumblr so there’s also the set of biases that come from there. This paragraph is supposed to be your disclaimer: I’m not an objective outside observer. This is the view from the inside, rather, why I personally think LW is useful, and why I (partially) disagree with nostalgebraist.

I think my first problem is: nostalgebraist is smart. And he’s got a certain kind of smarts, one that I find with some frequency in LW, that makes him say stuff like “‘many philosophical debates are the results of disagreements over semantics’ — yeah, we know.” The first point is: we don’t. I don’t know if I’m too used to dealing with people outside of LW, or if he’s too used to dealing with people around as smart as he is, but this sort of thing is not, in fact, obvious. Points like “don’t argue over words” and “the map is not the territory” and “if you don’t consciously watch yourself you will likely suffer from these biases” aren’t obvious! Most people don’t get them! I didn’t get them before I read LW, the vast majority of people I meet (from one of the 100 best engineering schools in the world) don’t know this!

LW-type “insights” are not, in fact, obvious to most people. Most people – and yes I’m including academics, scientists, mathematicians, whatever, people traditionally considered intelligent – do in fact spend most of their lives ignoring this completely. So I’ll get back to what exactly those insights may be later.

The second problem is… I also think he’s objectively wrong about what beliefs are actually common amongst LWers. Just take a look at the 2013 LW Survey Results. In fact, the website itself barely talks about FAI, so I don’t understand where the idea that Singularity-type beliefs are widespread comes from. Maybe it’s because everyone outside of LW doesn’t talk at all about FAI and Singularity and we talk a little about it? I dunno, my personal experience with LW is that much less than 0.5% of the time we spent talking is dedicated to this kind of discussion, and even belief in Singularity/FAI is oftentimes permeated with qualifiers and ifs and buts. And even the hardcore Bayesian thing isn’t all that settled either.

At any rate, there’s much more to it than just that.

Continue reading

Posted in Rationality | Tagged , , , | 2 Comments

Absence of evidence is evidence of absence

The W’s article about Evidence of Absence is confusing. They have an anecdote:

A simple example of evidence of absence: A baker never fails to put finished pies on her windowsill, so if there is no pie on the windowsill, then no finished pies exist. This can be formulated as modus tollens in propositional logic: P implies Q, but Q is false, therefore P is false.

But then go on to say: “Per the traditional aphorism, ‘absence of evidence is not evidence of absence’, positive evidence of this kind is distinct from a lack of evidence or ignorance[1] of that which should have been found already, had it existed.[2]

And at this point I go all ?????.

And then they continue with an Irving Copi quote: “In some circumstances it can be safely assumed that if a certain event had occurred, evidence of it could be discovered by qualified investigators. In such circumstances it is perfectly reasonable to take the absence of proof of its occurrence as positive proof of its non-occurrence.”

UM.

Alright so, trying to untangle this mess, they seem to want to make a qualitative distinction between “high-expectation evidence” and “low-expectation evidence.” Now, if you have read other stuff on this blog, like stuff about Bayes’ Theorem and the Bayesian definition of evidence and the many ways to look at probability and… Well, you must know by now that probability theory has no qualitative distinctions. Everything is quantitative. Any sharp divisions are strictly ad hoc and arbitrary and not natural clusters of conceptspace.

Thankfully, there is another quote in that W article that’s closer to the mark:

If someone were to assert that there is an elephant on the quad, then the failure to observe an elephant there would be good reason to think that there is no elephant there. But if someone were to assert that there is a flea on the quad, then one’s failure to observe it there would not constitute good evidence that there is no flea on the quad. The salient difference between these two cases is that in the one, but not the other, we should expect to see some evidence of the entity if in fact it existed. Moreover, the justification conferred in such cases will be proportional to the ratio between the amount of evidence that we do have and the amount that we should expect to have if the entity existed. If the ratio is small, then little justification is conferred on the belief that the entity does not exist. [For example] in the absence of evidence rendering the existence of some entity probable, we are justified in believing that it does not exist, provided that (1) it is not something that might leave no traces and (2) we have comprehensively surveyed the area where the evidence would be found if the entity existed…[5]
—J.P. Moreland and W.L. Craig, Philosophical Foundations for a Christian Worldview

This looks much more like Bayesian reasoning than the rest of that article did. But let’s delve deeper and see how to prove a negative.

Continue reading

Posted in Basic Rationality, Mathematics, Probability Theory, Rationality | Tagged , , , , , , , | Leave a comment

Beliefs and aliefs

What does it mean to believe?

This is not supposed to be some Deeply Wise prod to make someone write philosophical accounts of the mystical uniqueness of human consciousness or some such. It’s an actual question about the actual meaning of the actual word. Not that words have intrinsic meanings, of course, but what do we mean when we use this word?

And like many good words in the English language, it has a lot of meanings.

LessWrong has a lot of talk about this. Amongst the meanings of the verb “to believe” talked about in the linked Sequence are to anticipate an experience, to anticipate anticipating an experience, to cheer for a team, and to signal group membership. And of course, that’s not all. Some people in the atheist movement, for instance, use the word “belief” sometimes to refer to unjustified or faith-based models-of-the-world.

Now, there is a very interesting other word in philosophy and psychology: “alief.” To alieve something is to have a deep, instinctual, subconscious belief, and the word is used especially when this subconscious feeling is at odds with the conscious mind. The W uses a few examples to explain the concept, like the person who is standing on a transparent balcony and, in spite of believing themself safe, alieves the danger of falling.

This is a very interesting (and fairly obvious, after you grok the difference between your Systems 1 and 2) internal dichotomy. Ideally, we want our beliefs and aliefs to be identical, and whenever we change our beliefs we’d like to likewise change our aliefs. And I think much of what Yudkowsky means when he talks about making beliefs pay rent refers exactly to this concept, turning beliefs into aliefs. This would seem to be very useful for rationality in general – a large part of rationality techniques consists of a bunch of heuristics for turning conscious deliberations into intuitive judgements. And of course, it’s very hard to do.

Pascal’s Wager (the one that says that, on the off-chance that god does in fact exist and will punish you for not believing, you should believe in it) has lots of flaws in it, but I think this is a particularly severe one. Sure, maybe the human brain is absolutely and completely insane in how it translates beliefs into aliefs and vice-versa, but it seems to me that, most of the time, you can’t just, by an effort of will, force it to turn a belief into an alief. And Pascal himself admitted this, and said that what the rational person should do is act and behave as if they believed until they actually did. And I’m sure that would work with some people, eventually, in the sense that they’d believe they believe, they’d profess and cheer and wear their belief.

But I’ll be damned if any amount of praying will actually convince me, on the brink of death, that I’m about to meet the Creator.

Or some such, depending on which religion you’re talking about.

And one would think maybe a just god would reward honesty more than barefaced self-manipulation.

Whichever the case, you can’t just choose to anticipate experiences: either you do, or you don’t, for good or for ill. And the brain isn’t completely stupid, if it didn’t move somewhat according to evidence it would’ve been selected out of the gene pool a long time ago, but it’s not terribly efficient or smart about it, and its belief → alief translation procedure can be overriden by a lot of other modules, or twisted and hacked into unrecognisability. But it seems that, in general, a lot of rationality heuristics boil down to: okay, this is the normatively correct way to think – how do I internalise it?

I don’t know. It appears to take lots of practice or some such, and different kinds of belief require different kinds of alief-generating, and some people seem to be naturally better than others at this “taking ideas seriously” skill. But we all know that the whole of rationality isn’t limited to what Less Wrong has to offer, and as further research is done, well, I’d be eager to learn how to more efficiently internalise my beliefs.

Posted in Basic Rationality, Rationality | Tagged , , | 1 Comment

Learning Bayes [part 1]

I have a confession to make.

I don’t actually know Bayesian statistics.

Or, any statistics at all, really.

Shocking, I know. But hear me out.

What I know is… Bayesian theory. I can derive Bayes’ Theorem, and I also can probably derive most results from it. I’m a good mathematician. But I haven’t actually spent any time doing practical Bayesian statistics stuff. Very often a friend, like raginrayguns, tells me about a thing, a super cool thing, or asks me a thing, a super confusing thing, and it sort of goes over my head and I have to really do stuff from scratch to figure out my way. I don’t have the heuristics, I don’t have all the techniques of problem-solving perfectly available to me.

For example, earlier today another friend of mine came up to me and asked me a question about the difference between Bayesian and frequentist statistics. Basically, he has a bunch of data about a lot of bridges, and then four pieces of information about each of them: cost, material, length, and astrological sign of the designer of the bridge. He wanted – I had to ask a lot of questions to figure this out because, as I said, I don’t do statistics yet, I don’t know the jargon – to find the posterior distribution for the cost of his projected bridge, given a material, a length, and his astrological sign. Or rather, he wanted the Bayesian answer, because he knew the frequentist one already.

Let me pause this a bit, and talk about another problem.

Continue reading

Posted in Mathematics, Probability Theory | Tagged , , , , , , , , , | 3 Comments

Bayesian falsification and the strength of a hypothesis

At the end of my post about other ways of looking at probability, I showed you a graph of evidence against probability. This is the relevant graph:

Looking at this graph was one of the most useful things I’ve ever done as a Bayesian. It shows, as I explained, exactly where most of the difficulty is in proving things, in coming up with hypotheses, etc. Another interesting aspect is the symmetry, showing that, to a Bayesian, confirmation and disconfirmation are pretty much the same thing. Probability doesn’t have a prejudice for or against any hypothesis, you just gather evidence and slide along the graph. Naïvely, the concept of Popper falsification doesn’t look terribly useful or relevant or particularly true.

So whence comes the success of the idea?

Continue reading

Posted in Mathematics, Probability Theory, Rationality | Tagged , , , , , | 2 Comments

How and when to respect authority

When I discussed the usefulness (or lack thereof) of Aumann’s Agreement Theorem, I mentioned that the next best thing to sharing the actual knowledge you gathered (or mind melding) was sharing likelihood ratios.

But sometimes… you can’t. Well, most of the time, really. Or all the time. Humans do not actually have little magical plausibility fluids in their heads that flow between hypotheses and are kept track of dutifully by some internal Probability Inspector, just like humans do not actually have utility functions. If a Bayesian tells you that they believe a thing with 40% probability… either they’re crazy, or they’re Omega, or they’re giving you a ballpark estimate of subjective feelings of uncertainty.

And then there’s the time when your fellow rationalist… is not actually someone you know. They might be a friend of a friend, or a famous scientist, or just the abstract entity of Science.

Continue reading

Posted in Basic Rationality, Rationality | Tagged , , , , , | Leave a comment