## An anti-conjunction fallacy, and why I’m a Singularitarian

When anyone talks about the possibility or probability of the creation/existence of an UFAI, there are many failure modes into which lots of people fall. One of them is the logical fallacy of generalisation from fictional evidence, where people think up instances of AI in fiction and use that as an argument. Another is how the harder a problem is, the faster someone solves it, without spending even five minutes thinking about it. The absurdity heuristic makes an appearance, too.

But someone who’s familiar with LW or the whole cognitive biases shizzaz might be a bit cleverer and argue that most futurists get it wrong and predicting the future is actually really hard (conjunction fallacy). Ozy wrote a post about donating to MIRI in which zie points this out, but in the end mentions talking to, well, yours truly about it, and I think overall there are three points where I disagree with zir.

First, I propose the existence of a fallacy related to the conjunction fallacy and the sophisticated arguer effect, something I’ll call the Anti-Conjunction Fallacy, or perhaps the Disjunction Fallacy, or something. Maybe this is not a direct countercounterargument to Ozy’s point, but it’s a more general countercounterargument to the counterargument that “predicting AIs typically invokes a highly complex narrative with a high Complexity Penalty.”

The Conjunction Fallacy is a fancy name to the idea that sometimes people judge $P(A\land B) > P(A)$, which is to say that a more complex proposition with more details seems to us more probable than a simpler one due to appealing to our sense of narrative. This is a fallacy because it’s a theorem of probability that the exact negation of that sentence is true, no matter what $A$ and $B$ are; that is, it is always the case that $P(A\land B) \leq P(A)$. But conversely, we have that $P(A\lor B)\geq P(A)$, that is, a disjunctive story is more likely than any of its components.

My proposed fallacy is this: many people (particularly rationalists) who see a long tale have an instinct to cry complexity penalty without actually checking whether the logical connective between the elements of that tale is a conjunction or a disjunction, AND or OR, and thus fall into the trap of saying that a disjunctive story has a low probability due to this instinct. And in my experience, most AGI predictions seem to be heavily disjunctive, in that the people making them (such as Nick Bostrom in his book) suggest a myriad possible disjunctive ways a superintelligence could arise, each of which relatively probable given current trends (e.g. whole brain emulations are an active research area which has seen actual results), so the posterior probability of the enterprise as a whole is much higher than that of each of those paths. This is true of many parts of the superintelligence narrative, from its formation to its takeoff to its potential powers. I don’t need five minutes to think of five different ways a superintelligence could reasonably take over the world and I’m not superintelligent.

So the moral of this part here is that, when you see a long prediction about something, first see whether it’s disjunctive or conjunctive before looking for fallacies. Isaac Asimov may have been wrong about the exact picture the future would paint, but by golly a large number of his individual predictions did in fact come true!

My second point is not so much an objection as a sort of reminder about what MIRI is actually doing. I’m not sure what its original goals were, but it most certainly isn’t trying, by itself, to program a superintelligence, at least not right now. Ozy says:

So it seems possible the solution is not independent funding, but getting the entire AGI community on board with Friendliness as a project. At that point, I can assume that they will deal with it and I can return to thinking of technology funding as a black box from which iPhones and God-AIs come out.

The thing is, that is one of MIRI’s explicit goals, outreach about AI dangers. And they seem to be at least mildly successful, or at any rate something was, given that Google created an AI Ethics board when it bought DeepMind, and given the growing number of prominent intellectuals that have been talking about the dangers of AI lately, some of which directly mentioning MIRI.

My third and final objection is that I think zie misunderstood me when I talked about the predictive skill of people who actually build technologies. I didn’t mean that they have some magical insider information or predictive superpowers that allow them to know these things; I meant that when you’re the one building a thing, what you’re doing isn’t predicting as much as it is setting goals. Predicting what Google is going to do is one thing, being inside Google actually doing the things is a whole ‘nother, and when AGI researchers talk about AGI there is frequently an undertone of “even if no one else is gonna do it, I am.” Someone who works at MIRI isn’t concerned so much with the prediction that a superintelligence is possible as they are with their own ability to bring it about, or raise the odds of a good outcome if/when it does.

My last point is something Ozy touched upon and on which I want to elaborate. Zie mentioned AGI is fundamentally different than other “large-scale” projects from before in that, unlike, say, nukes, the way it’s done will severely impact its outcome. As it is, I’d argue that almost no conclusions at all can be drawn from the past funding and development of technological advances because… the sample space is tiny. We can’t judge whether individuals funding research is an effective method of getting that research done because this idea, and the means to do so effectively, are brand new. During the 20th century, most technological advances happened due to the military, but that’s perfectly understandable given the climate: two full wars and a cold one spanning large powers, constant change in political and economic climates…

But large tech companies are a new invention, and it is my impression that, since at least mid-nineties, most of the technological advancements have had at least a hand of the private sector, and this seems to increasingly be the case. I’m not sceptical at all of the ability of individually funded technologies, especially software technologies, to play a large part in the future, because that’s what they’re doing right now, in the present.

But at any rate, there are a number of ways AGI could come about, and MIRI is trying to do what it can. So far, other than that, the FHI, and mmmmaaaaybe Google, it seems no one else is.

Posted in Rationality | | 2 Comments

## Alieving Rationality

Almost six years ago, Scott wrote a post on LW about the apparent lack of strong correlation of real-world instrumental success and studying what he calls “x-rationality” – that is, OB/LW-style rationality, of the kind that’s above and beyond the regular skills you can get from being a generally intelligent, thoughtful, and scientifically-minded person.

I’d be quite interested in hearing what his opinion is six years into the future, but my current one is that this situation hasn’t changed much, in general. In fact, I was linked to his post by a recent reply Ozy sent someone on zir blog, while commenting that zie didn’t spread LW memes because zie didn’t feel they were very useful. I’m not alone in this, then. (Let’s remember that CFAR exists now, though.)

I’d like to share my thoughts on another potential factor contributing to this case, something that was alluded to by the post and many commenters of it (including himself and Anna Salamon), something I’ve noticed that… I do. A skill, maybe.

Aspiring x-rationalists are the people who look at the mountain of stuff on Ovecoming Bias, Less Wrong, and other such sources, and decide that it makes sense, that their lives would be improved by the application of these techniques, so they go on and learn everything about it. They memorise it, they absorb all these memes to the point of being able to recite by heart many of the more famous quotes. And yet there isn’t a strong correlation! We’re not producing superheroes every other Tuesday! What gives?

I’d say it’s that believing rationality and alieving rationality are really different things.

## On Arrogance

arrogant
having or revealing an exaggerated sense of one’s own importance or abilities.

A friend of mine once mentioned on a comment written in response to some post or another in a facebook debate group that he had knowledge of maths far above the Brazilian average. That is a simple factual sentence, a true statement (which isn’t exactly surprising given what the Brazilian average actually is). The next few comments called him arrogant.

(ETA: This is an even better example of what I’m talking about here.)

I wonder what goes on in people’s heads when they say something like that. And by “wonder” I mean “sigh exasperatedly at the silliness of rules of etiquette.”

It’s clear, if you look at society and people in general, that people do not like feeling inferior. Not only that, people dislike feeling inferior so much that it’s become a generalised heuristic not to show superiority in any aspect. It’s rude to be seen as better than anyone at anything. It will give you trouble in most social circles. That can probably be easily explained: if you’re superior at something, everyone feels jealous, and stops helping you socially, so you end up being worse off than if you were just average.

It’s okay to want to be better than yourself. But being better than other people? You have to be more humble! How can you possibly think you could actually be better than other people?? That’s incredibly arrogant of you!

Yudkowsky makes a distinction between humility and social modesty: the latter is the kind of social thing you have to show, the “don’t-stick-out” heuristic; the former is actual, real, rational humility, the kind that recognises exactly how sure they are about the outcome of a decision and what steps must be taken in order to minimise the possibility of disaster.

So people calling you arrogant is frequently, in fact, a motte-and-bailey argument. The definition I presented at the top, of a false belief in one’s superiority (or even just a belief in one’s “general superiority” as if that existed), that’s the motte. The bailey is expressing superior aptitude at anything at all without paying your due to social modesty; it’s acknowledging your skills when they’re actually good. How dare you claim you’re better than anyone else? You’re just as flawed and imperfect as all of us! Even if you’re not. You have to pretend you are, just to not commit social suicide.

What I usually say is this: it’s not arrogance if it’s true.

## On Magical Universes

Any sufficiently advanced technology is indistinguishable from magic.
— Sir Arthur C. Clarke (1917 – 2008)

The above quote is quite famous, at least amongst certain types of people. And the core idea is a pretty idealistic and hopeful one: technology will one day get so advanced that it will look like magic.

Or maybe it’s actually quite realistic, under another lens. If you brought a peasant from the Middle Ages to the present and showed them fast-moving gigantic flying metal contraptions, thin screens that show people on the other side of the world, and little gadgets that let you scry the past and communicate with your loved ones no matter where they are, the peasant would run away screeching: “WITCHCRAFT!” They wouldn’t run very far, they’d probably be hit by a car, but they’d run alright.

Sufficiently analysed magic is indistinguishable from science (warning: TV Tropes link). This sentence is similar to the quote starting the post, but it’s not nearly as deep or meaningful. Science is, after all, just the method. If a thing exists, then it falls under the scope of science. So if magic exists and works then it can be science’d. Let’s try to science it. Exactly how magical does magic have to be before it goes beyond the boundaries of what’s achievable by technology? Exactly how advanced does technology have to be before it’s far enough from our suspension of disbelief that we’re willing to call it magic?

A more practical question might be: what should you conclude about the universe once you observe magic in it?

Posted in Mathematics, Philosophy | | 20 Comments

## Learning Bayes [part 2]

In part 1, I talked about the Bayesian way of dealing with, well, noise, in a certain sense. How do I figure out that I “should not” conditionalise on a person’s astrological sign when predicting the cost of the bridge they’ll build, but that I “should” conditionalise on the bridge’s material without arbitrarily choosing the former to have zero influence and the latter to have “some” influence. This was brought up because a friend of mine was talking about stuff to me. And stuff.

And then that friend commented on the same post explaining that that did not quite get to the heart of what he was looking for. The best way I could find to phrase that was one of differently-parametrised Bayesian model selection. And like I said, I have no training in statistics, so I talked to my friend raginrayguns about it, and after a long time during which we both discussed this and talked and had loads of fun, we (he) sort of reached a conclusion.

Maybe.

I mean, there’s a high probability that we (he) reached a conclusion.

So, suppose we have two possible models, $M_1(\textbf a)$ and $M_2(\textbf a, \textbf b)$ that could explain the data, and these models have a different number of parameters. $\textbf a$ is a vector of the parameters both models have in common, and $\textbf b$ is the vector of the parameters that are present only in the second model.

My friend’s example is the following: he has a scatter plot of some data showing an apparently linear relationship between two variables: Y = α + βX + ε where I suppose ε is normally distributed. Upon closer inspection, however, it looks like there are actually two lines instead of only one! We had two interns, each of whom collected 50 of the samples.

So the common parameters are $\textbf a = (\alpha, \beta)$, and the parameters only the second model has are $\textbf b = (\lambda_{\alpha 1}, \lambda_{\beta 1}, \lambda_{\alpha 2}, \lambda_{\beta 2})$ which we’ll call the intern effect. In that case, then, the αs and βs of the second model are going to be seen as the same α and β from the first plus this intern effect.

So, nothing changes. To figure out the posterior probability of the parameters, we’d just use Bayes’ Theorem; same goes for the posterior of the models. But the old frequentist problem of “models with more parameters always have better fit” still remains. How do we get rid of it?

The trick is not judging a model based on its best set of parameters, but rather averaging over all of them. Let’s try this. Suppose the data is represented by $d$. Then we want the posteriors $p(M_1|d, X)$ and $p(M_2|d, X)$. Or maybe we just want the posterior odds for them. Whichever may be the case, we have:

$p(M_1|d, X) = \frac{p(d|M_1, X)}{P(d|X)}p(M_1|X)$

And then we can find the probability of the data given a model using the law of total probability:

$p(d|M_1, X) = \int p(d|\textbf a, M_1, X)p(\textbf a|M_1, X)d\textbf a$

And of course, the same applies for Model 2:

$p(d|M_2, X) = \int\int p(d|\textbf a, \textbf b, M_2, X)p(\textbf a|\textbf b, M_2, X)p(\textbf b|M_2, X)d\textbf ad\textbf b$

And in these, $p(d|\textbf a, M_1, X)$ and $p(d|\textbf a, \textbf b, M_2, X)$ are just the likelihood functions of traditional statistics. Then, the posterior odds – which are in general much more useful since they just define how much the evidence supports a hypothesis when compared to another instead of in absolute terms – are given by:

$O(M_1:M_2|D, X) = \mathcal{LR}(M_1:M_2;D)O(M_1:M_2|X)$

Where the likelihood ratio is just:

$\mathcal{LR}(M_1:M_2;D) = \frac{\int P(D|\textbf a, M_1, X)p(\textbf a|M_1, X)d\textbf a}{\int\int P(D|\textbf a, \textbf b, M_2, X)p(\textbf a|\textbf b, M_2, X)p(\textbf b|M_2, X)d\textbf ad\textbf b}$

Here I’m using capitals for the probability because I’m no longer talking about the specific sampling distribution but rather its value on the observed data $D$.

And there’s the thing that’s used here and that never ever shows up in frequentist statistics, as usual, which is the prior distribution for… well, everything. We have the prior odds ratio between the two models, and if we’re just interested in which of the two models the data supports better, we still need the priors for the parameters themselves. And this method is, of course, more general than just two models with a core of the same parameters and where one of the models has more parameters than the other; they can have two complete different sets of parameters, and you just do that.

What would a prior for the parameters look like? Of course, it depends on the case. One would expect, perhaps, that in the linear case described by my friend, they’d possibly be normally distributed with mean $0$ (no idea which variance), or something. Usually at this point we just throw our arms up and do a non-Bayesian thing: just choose something convenient. Note how I said that this is non-Bayesian. Picking and choosing priors that seem “reasonable” isn’t justifiable in the ideal case, because then the prior would be uniquely determined, but it’s the best we can do in practice, lacking perfect insight into our own states of knowledge.

Alright. So now we have the posterior probability – or at least posterior odds – for the models. So do we just pick the one with the highest posterior and go with it?

Not quite. There are a few problems with this approach. First and foremost, this is a Decision Theory problem, and unless you care only about strict accuracy, there might be value in using a model you know to be wrong because it’s not too wrong and you can still draw useful inferences from it.

For example, suppose that instead of having that the difference between the two possible lines is because of different interns, it happens because a certain effect affects two populations differently. That would mean that, in order to estimate the effect in either population, you would have access to much less data than if you bundled up both populations and pretended they were under the exact same effect. And while this might sound dishonest, the difference between the two models might be small enough that the utility of having on average twice as many points of data is larger than the disutility of using a slightly incorrect model. But of course, there is no hard-and-fast rule about how to choose, or none that’s a consensus anyway, and this is sort of an intuitive choice to balance the tradeoff between fit and amount of data.

Another problem is that, even if you’re willing to actually just pick a model and run with it, maybe there is no preferred model. Maybe the posterior odds of these two models is one, or close enough. What do you do, then? Toss a coin?

No. Bayesian statistics has a thing called a mixed model. We want to make inference, right? So that basically means getting a distribution for future data based on past data: $p(d_f|D_p, X)$. We can, once more, use the law of total probability:

\begin{aligned}p(d_f|D_p, X) &= p(d_f|M_1, D_p, X)P(M_1|D_p, X) \\ &+ p(d_f|M_2, D_p, X)P(M_2| D_p, X) \\ &+p(d_f|\bar{M_1},\bar{M_2}, D_p, X)P(\bar{M_1}, \bar{M_2}|D_p, X) \end{aligned}

If we’re fairly confident that either model 1 or 2 are the appropriate explanations for this data, i.e. $P(\bar{M_1}, \bar{M_2}|D_p,X)\approx 0$,, then we can use only the two first lines. Even if we don’t, we can still have a good approximation. So we predict future data by taking a weighted average between the predictions of each model where the weight is the posterior probability of those models.

## The use of Less Wrong

I’ve been planning on writing a post along these lines, and the recent thing on tumblr about the LW community has given me just the right motivation and environment for it. Specifically this nostalgebraist post gave me the inspiration I needed. He described the belief-content of LW as either obvious, false, or benign self-help advice one can find in many other places.

Now, nostalgebraist isn’t a LWer. I am. So let me say what the belief-content of LW looks like, to me. Why do I think LW-type “rationality” is useful? What’s the use of it all? Is it just the norms of discourse?

And of course you have to take this with a grain of salt. I’m a LWer. So I’m severely biased in favour of it, compared to baseline. And even nostalgebraist is pretty warm towards the community, or at least the tumblr community, so even his opinion is somewhat closer to being positive than baseline. To properly avoid Confirmation, opinions of people who have had bad experiences with LW should be sought. I’ve seen quite a few on tumblr too, but none really outside of tumblr so there’s also the set of biases that come from there. This paragraph is supposed to be your disclaimer: I’m not an objective outside observer. This is the view from the inside, rather, why I personally think LW is useful, and why I (partially) disagree with nostalgebraist.

I think my first problem is: nostalgebraist is smart. And he’s got a certain kind of smarts, one that I find with some frequency in LW, that makes him say stuff like “‘many philosophical debates are the results of disagreements over semantics’ — yeah, we know.” The first point is: we don’t. I don’t know if I’m too used to dealing with people outside of LW, or if he’s too used to dealing with people around as smart as he is, but this sort of thing is not, in fact, obvious. Points like “don’t argue over words” and “the map is not the territory” and “if you don’t consciously watch yourself you will likely suffer from these biases” aren’t obvious! Most people don’t get them! I didn’t get them before I read LW, the vast majority of people I meet (from one of the 100 best engineering schools in the world) don’t know this!

LW-type “insights” are not, in fact, obvious to most people. Most people – and yes I’m including academics, scientists, mathematicians, whatever, people traditionally considered intelligent – do in fact spend most of their lives ignoring this completely. So I’ll get back to what exactly those insights may be later.

The second problem is… I also think he’s objectively wrong about what beliefs are actually common amongst LWers. Just take a look at the 2013 LW Survey Results. In fact, the website itself barely talks about FAI, so I don’t understand where the idea that Singularity-type beliefs are widespread comes from. Maybe it’s because everyone outside of LW doesn’t talk at all about FAI and Singularity and we talk a little about it? I dunno, my personal experience with LW is that much less than 0.5% of the time we spent talking is dedicated to this kind of discussion, and even belief in Singularity/FAI is oftentimes permeated with qualifiers and ifs and buts. And even the hardcore Bayesian thing isn’t all that settled either.

At any rate, there’s much more to it than just that.

Posted in Rationality | Tagged , , , | 2 Comments

## Absence of evidence is evidence of absence

The W’s article about Evidence of Absence is confusing. They have an anecdote:

A simple example of evidence of absence: A baker never fails to put finished pies on her windowsill, so if there is no pie on the windowsill, then no finished pies exist. This can be formulated as modus tollens in propositional logic: P implies Q, but Q is false, therefore P is false.

But then go on to say: “Per the traditional aphorism, ‘absence of evidence is not evidence of absence’, positive evidence of this kind is distinct from a lack of evidence or ignorance[1] of that which should have been found already, had it existed.[2]

And at this point I go all ?????.

And then they continue with an Irving Copi quote: “In some circumstances it can be safely assumed that if a certain event had occurred, evidence of it could be discovered by qualified investigators. In such circumstances it is perfectly reasonable to take the absence of proof of its occurrence as positive proof of its non-occurrence.”

UM.

Alright so, trying to untangle this mess, they seem to want to make a qualitative distinction between “high-expectation evidence” and “low-expectation evidence.” Now, if you have read other stuff on this blog, like stuff about Bayes’ Theorem and the Bayesian definition of evidence and the many ways to look at probability and… Well, you must know by now that probability theory has no qualitative distinctions. Everything is quantitative. Any sharp divisions are strictly ad hoc and arbitrary and not natural clusters of conceptspace.

Thankfully, there is another quote in that W article that’s closer to the mark:

If someone were to assert that there is an elephant on the quad, then the failure to observe an elephant there would be good reason to think that there is no elephant there. But if someone were to assert that there is a flea on the quad, then one’s failure to observe it there would not constitute good evidence that there is no flea on the quad. The salient difference between these two cases is that in the one, but not the other, we should expect to see some evidence of the entity if in fact it existed. Moreover, the justification conferred in such cases will be proportional to the ratio between the amount of evidence that we do have and the amount that we should expect to have if the entity existed. If the ratio is small, then little justification is conferred on the belief that the entity does not exist. [For example] in the absence of evidence rendering the existence of some entity probable, we are justified in believing that it does not exist, provided that (1) it is not something that might leave no traces and (2) we have comprehensively surveyed the area where the evidence would be found if the entity existed…[5]
—J.P. Moreland and W.L. Craig, Philosophical Foundations for a Christian Worldview

This looks much more like Bayesian reasoning than the rest of that article did. But let’s delve deeper and see how to prove a negative.

## Beliefs and aliefs

What does it mean to believe?

This is not supposed to be some Deeply Wise prod to make someone write philosophical accounts of the mystical uniqueness of human consciousness or some such. It’s an actual question about the actual meaning of the actual word. Not that words have intrinsic meanings, of course, but what do we mean when we use this word?

And like many good words in the English language, it has a lot of meanings.

LessWrong has a lot of talk about this. Amongst the meanings of the verb “to believe” talked about in the linked Sequence are to anticipate an experience, to anticipate anticipating an experience, to cheer for a team, and to signal group membership. And of course, that’s not all. Some people in the atheist movement, for instance, use the word “belief” sometimes to refer to unjustified or faith-based models-of-the-world.

Now, there is a very interesting other word in philosophy and psychology: “alief.” To alieve something is to have a deep, instinctual, subconscious belief, and the word is used especially when this subconscious feeling is at odds with the conscious mind. The W uses a few examples to explain the concept, like the person who is standing on a transparent balcony and, in spite of believing themself safe, alieves the danger of falling.

This is a very interesting (and fairly obvious, after you grok the difference between your Systems 1 and 2) internal dichotomy. Ideally, we want our beliefs and aliefs to be identical, and whenever we change our beliefs we’d like to likewise change our aliefs. And I think much of what Yudkowsky means when he talks about making beliefs pay rent refers exactly to this concept, turning beliefs into aliefs. This would seem to be very useful for rationality in general – a large part of rationality techniques consists of a bunch of heuristics for turning conscious deliberations into intuitive judgements. And of course, it’s very hard to do.

Pascal’s Wager (the one that says that, on the off-chance that god does in fact exist and will punish you for not believing, you should believe in it) has lots of flaws in it, but I think this is a particularly severe one. Sure, maybe the human brain is absolutely and completely insane in how it translates beliefs into aliefs and vice-versa, but it seems to me that, most of the time, you can’t just, by an effort of will, force it to turn a belief into an alief. And Pascal himself admitted this, and said that what the rational person should do is act and behave as if they believed until they actually did. And I’m sure that would work with some people, eventually, in the sense that they’d believe they believe, they’d profess and cheer and wear their belief.

But I’ll be damned if any amount of praying will actually convince me, on the brink of death, that I’m about to meet the Creator.

Or some such, depending on which religion you’re talking about.

And one would think maybe a just god would reward honesty more than barefaced self-manipulation.

Whichever the case, you can’t just choose to anticipate experiences: either you do, or you don’t, for good or for ill. And the brain isn’t completely stupid, if it didn’t move somewhat according to evidence it would’ve been selected out of the gene pool a long time ago, but it’s not terribly efficient or smart about it, and its belief → alief translation procedure can be overriden by a lot of other modules, or twisted and hacked into unrecognisability. But it seems that, in general, a lot of rationality heuristics boil down to: okay, this is the normatively correct way to think – how do I internalise it?

I don’t know. It appears to take lots of practice or some such, and different kinds of belief require different kinds of alief-generating, and some people seem to be naturally better than others at this “taking ideas seriously” skill. But we all know that the whole of rationality isn’t limited to what Less Wrong has to offer, and as further research is done, well, I’d be eager to learn how to more efficiently internalise my beliefs.

Posted in Basic Rationality, Rationality | Tagged , , | 1 Comment

## Learning Bayes [part 1]

I have a confession to make.

I don’t actually know Bayesian statistics.

Or, any statistics at all, really.

Shocking, I know. But hear me out.

What I know is… Bayesian theory. I can derive Bayes’ Theorem, and I also can probably derive most results from it. I’m a good mathematician. But I haven’t actually spent any time doing practical Bayesian statistics stuff. Very often a friend, like raginrayguns, tells me about a thing, a super cool thing, or asks me a thing, a super confusing thing, and it sort of goes over my head and I have to really do stuff from scratch to figure out my way. I don’t have the heuristics, I don’t have all the techniques of problem-solving perfectly available to me.

For example, earlier today another friend of mine came up to me and asked me a question about the difference between Bayesian and frequentist statistics. Basically, he has a bunch of data about a lot of bridges, and then four pieces of information about each of them: cost, material, length, and astrological sign of the designer of the bridge. He wanted – I had to ask a lot of questions to figure this out because, as I said, I don’t do statistics yet, I don’t know the jargon – to find the posterior distribution for the cost of his projected bridge, given a material, a length, and his astrological sign. Or rather, he wanted the Bayesian answer, because he knew the frequentist one already.

Let me pause this a bit, and talk about another problem.

## Bayesian falsification and the strength of a hypothesis

At the end of my post about other ways of looking at probability, I showed you a graph of evidence against probability. This is the relevant graph:

Looking at this graph was one of the most useful things I’ve ever done as a Bayesian. It shows, as I explained, exactly where most of the difficulty is in proving things, in coming up with hypotheses, etc. Another interesting aspect is the symmetry, showing that, to a Bayesian, confirmation and disconfirmation are pretty much the same thing. Probability doesn’t have a prejudice for or against any hypothesis, you just gather evidence and slide along the graph. Naïvely, the concept of Popper falsification doesn’t look terribly useful or relevant or particularly true.

So whence comes the success of the idea?

Posted in Mathematics, Probability Theory, Rationality | | 2 Comments