February 27th, 2012
Bayesianism's Metaphysical Baggage
1. The Itinerary:
Bayesianism is a system for thinking rationally about new information. It gives a
method for updating beliefs in light of new evidence, but does not provide prior probabilities. If we wish to use Bayesianism to reason about actual or hypothetically actual events we require a prior probability distribution. It is not clear that all prior probability distributions are rational; and arbitrary prior probability assignments, which lack any reason to choose one over another, are unsatisfying. I will argue that the probability distributions which can be endorsed by a finite agent commit the practising Bayesian to an uncommon metaphysical claim on the cardinality of the set of possible worlds.
According to the subjective interpretation of probability used in Bayesianism, the probability of an event occurring is a measure of the degree of belief that the event will happen which it is rational to hold. This 'probability-of' as 'confidence-in' is conventionally termed credence. Notably, on this account probabilities are not in general the properties of events, but are rather relationships between an agent attitudes and a hypothetical event.
The Bayesian approach gives a rational method for updating credences in light of new evidence. However it does not provide initial credences. Bayesianism only provides a method to rationally update an already existent set of credences, a prior probability distribution; it cannot generate subjective probabilities from scratch. If an agent wishes to use Bayesianism to evaluate claims in light of evidence—to reason, then the agent must begin with some prior probability distribution acquired without the help of Bayes.
The Bayesian program is motivated by the desire to be rational, to reason optimally or at least well. One rule we might rationally want our probability assignments to cohere with is regularity, roughly: only the impossible should be given a credence of zero, and only the necessary a credence of one. Failing to conform to regularity opens an agent up to a Dutch book, whereby the agent's otherwise rational decision making leads to a certain loss.
In practice agents are not concerned with having rational credence-value-assignments for any and all arbitrary propositions, but are concerned with finding rational credence assignments for specific explicitly stated propositions, while ensuring these assignments are consistent with the rules derived in the arbitrary case. An agent can only be concerned about questions that can be asked. For example if a question cannot be expressed with anything less than an infinite symbol string, then the question in its entirety cannot be understood by a (finite) agent, and thus the answer to the question is utterly uninformative for that agent.
The logic of probability, including in particular Bayesian subjective probability, is captured in the mathematical concept of a probability space. Not all probability spaces permit credence assignments that cohere with regularity. I will suggest a restriction on which probability spaces we should consider, which will allow us to retain regularity.
Further I will argue from the theoretical limits of an agent's concern that the exercise of Bayesianism is not compatible with any but the aforementioned restricted set of probability spaces. Endorsing only these acceptable probability spaces amounts to a metaphysical commitment on the cardinality of possible worlds.
2. The Trip:
A probability space is an ordered triple usually written as: (Ω, F, P). The set Ω is a “sample space”, a non-empty set of possible states, for my purposes we can call Ω the set of possible worlds. The σ-algebra F is a subset of the powerset of Ω (which is to say: the elements of F are subsets of Ω) the elements of which correspond to propositional claims, e.g. the proposition that “it will rain tomorrow” is equivalent to the claim that the actual world belongs to the subset of worlds in the set of all possible worlds in which it rains tomorrow, that is the actual world belongs to some certain element of F. This interpretation is in line with, and typically credited to Lewis:
“I identify propositions with certain properties - namely, with those that are instantiated only by entire possible worlds. Then if properties generally are the sets of their instances, a proposition is a set of possible worlds. A proposition is said to hold at a world, or to be true at a world. The proposition is the same thing as the property of being a world where that proposition holds; and that is the same thing as the set of worlds where that propositions holds. A proposition holds at just those worlds that are members of it.” (1986, 53-54)
P is a relation taking elements of F into unique elements (no element of F is assigned more than one probability) of some probability assignment range. Typically P is assumed to be a function from F to the real interval [0,1]. I will not assume that the range of P is [0,1] in the reals, nor that it is defined on all of F. However I will assume, in line with the Kolmogorov axiomatization that the range of P is bound above and below by 0 and 1 respectively, and that P(Ø)=0, while P(Ω)=1. P embodies the credence assignments of a Bayesian agent, and is different for each agent on the subjective account of probability. We now have the terminology requisite for a concise account1.
In probability space terminology Bayesianism gives rules about how an an agent ought to modify their probability assignment function—P—in light of new evidence. In particular new evidence rules out all the otherwise possible worlds in which the agent does not acquire the evidence in question. This ruling-out in turn ought to change the credence assignments given to the elements of F. Bayesianism prescribes how to update these credence assignments in a rational way. Crucially Bayesianism does not provide a probability assignment function. Bayesian inference must be fed a seed function, an initial P of a priori probability assignments. “If not supplemented by the initial probability distribution [that is: P] over this family [that is: F], the framework is useless; if so supplemented it is sufficient.” (de Finetti 1969, 13)
Many attempts to provide a priori probability distributions have been made. Perhaps the most intuitive is the Laplacian Principle of Indifference2 which states that: given a set of mutually exclusive possible outcomes, if we have no positive reason to expect any one outcome over another, we should assign each possible outcome the same credence. This is unproblematic in the finite case; if there are n possible outcomes, we simply assign credences of 1/n to each outcome. But if there are infinitely many possible outcomes there is not in general a non-arbitrary uniform probability distribution. Since if there are infinitely many alternative outcomes and we assign some positive credence identically to each alternative, then the probability of one of the outcomes in question obtaining, P(Ω), is infinite and not bound by 1. The properties of any distribution on an infinite set of disjoint sets in F which avoids infinite probabilities will depend on the particular parametrization of the sample space.
To be regular is to assign a probability of zero to the empty-outcome and only to the empty-outcome, motivated since no possible world corresponds to any impossible event, and to assign a probability of one only to the set of all possible worlds, motivated since whatever happens will be something that was possible (even if it is only apparent in retrospect). Besides this appealing motivation, one can devise Dutch book arguments in favour of regularity. Suppose I am about to select a random rational number between zero and one, say by throwing at a rational number line one of those astounding darts standard in the thought-experimentalist's kit. Further I enumerate the rationals in [0,1] calling them x1, x2, . . . xn . . . . I offer you a series of bets: if the dart lands exactly on xn, you owe me $2, otherwise I owe you $1/2n. Since, against regularity you have assigned P(x)=0 for any point x belonging to [0,1] in Q, your expected value for the nth bet is $1/2n∙(1-0)-$2∙0 = $1/2n > 0, thus rationally you take the bet, and similarly each subsequent bet. But one such xk will be hit, thus you will lose the kth bet and will owe me $2, while you will win every other bet and so I will owe you $(1+1/2+1/4+...+1/2k-1/2k+...)=$(2-1/2k). But then on accepting the dutch book you are guaranteed to lose $1/2k for some k. So regularity seems necessary for rationality.
Nevertheless regularity seems difficult to achieve. In the above Dutch book argument the gambler has no reason to believe any one rational number is more likely than any other, further the xn enumeration was arbitrarily preformed without the gambler's oversight, thus any permutation of the labels xn is informationally equivalent for the gambler. Thus by symmetry any reordering should be given the same probability distribution, and thus the probability distribution must be uniform. But as we have already informally observed, the gambler is then forced to assign to each rational point of [0,1] a probability of zero, exactly the assignment that opened the gambler up to Dutch booking.
“In order for there to be . . . regularity, there has to be a certain harmony between the cardinalities of P’s domain—namely F—and P’s range. If F is too large relative to P’s range, then a failure of regularity is guaranteed, and this is so without any further constraints on P. . . . Indeed, any probability function defined on an uncountable algebra assigns probability 0 to uncountably many propositions, and so in that sense it is very irregular. (See Hájek 2003 for proof.)” (Hájek (preprint),19)
If we wish to save regularity, and thus rationality, we must be careful in choosing F, and P, and by extension Ω, since our choice of Ω will inevitably effect what options are available for F (since F is defined as collection of subsets of Ω), and what options are suitable for the range of P. And I will argue that in practice and interpretation, our choice of Ω puts limitation on our choice of the range of P once we recognize that the agents using P must belong to some world in Ω.
Again from Hájek: “Pruss (MS) generalizes this observation. Assuming the axiom of
choice, he shows that if the cardinality of Ω is greater than that of the range of P, and this range is totally ordered, then regularity fails: either some subset of Ω gets probability 0, or some subset gets no probability whatsoever. The upshot is that non-totally ordered probabilities are required to save regularity—a departure from orthodoxy so radical that I wonder whether they deserve to be called 'probabilities' at all.” (Hájek (preprint), 20)
So our options if we wish to save regularity are to: restrict the cardinality of F to no larger than the cardinality of the chosen range of P; or to expand the cardinality of P to match a chosen F; or as Hájek suggests as a final alternative: give up on totally ordered probabilities.
A probability space that is not totally ordered contains propositions with well defined probabilities which are not all comparable. In a non-totally ordered space we may know the probability of events x and y, yet not know which one is more probable. Since in such a space the very question of which is more probable may not have an answer, even though they have well defined probabilities. An agent offered a choice between two bets: one offering $1 if x obtains, the other $1 if y obtains, would have no means to compare the two bets, even with perfect reasoning ability and complete probabilistic knowledge. Despite the two bets having the same payoff, but with different odds, there is no fact of the matter as to which is a better bet. But this seems against all intuition about what probability is supposed to be about. Given that the Bayesian program is to find rules for rational thought (in particular 'credence updating'), I posit that we can dismiss non-totally ordered 'probabilities' as simply not the object we wish to study.
This leaves judicious selection of F and P as our only recourse. Given some F we may choose to define P so it has a range with a large enough cardinality to allow regularity. But, as Bayesian agents we want to assign some definite probability—some element of the range of P—to each element of F presented to us. We want to be able to evaluate the function P at arguments fϵF. Therefore, if we wish to ever use Bayesianism, as is the motivation for its development, P must be a computable function4. Thus the range of P must be a set of computable numbers. But the computable numbers are countably infinite, hence the range of P must be at most countably infinite. Then merely from the fact that we, along with all other plausible agents of concern, are finite, a predetermined limit on the maximum cardinality of P arises. We thus find the range of P has a fixed maximum, and so we are driven to the third and final possibility for saving regularity.
Having determined an upper bound on the cardinality of the range of P that can be rationally postulated, the only hope to save regularity is to restrict the cardinality of F. F is defined to be a subset of the power set of Ω, so the cardinality of Ω puts a limit on how large F can be. Thus if we wish to find a rational restriction on the size of F, it will be worthwhile to see what restrictions may be placed on Ω.
In the probability space triple Ω is supposed to be the space of possible worlds, but what sense of possible worlds? For the Bayesian subjectivist agent, Ω is no more than the space of worlds whose possibility the agent is willing or able to entertain. For some Ω is rather large. Indeed for Pruss “we have a reductio ad absurdum of the assumption that the collection of all possible worlds forms a set” (2001, 171-2). Pruss holds that Ω is too large to be a Cantorian set, and must in fact be a class. Such a result leaves F ill-defined as there is no such thing as the powerset of a class, and I find it difficult to see how a finite agent can distinctly entertain the possibilities of an entire proper class of states. Lewis is less extreme on the issue than Pruss5, but nevertheless endorses a cardinality for Ω of at least 2c:
“But it is easy to argue that there are more than continuum many possibilities we ought to distinguish. It is at least possible to have continuum many spacetime points; each point may be occupied by matter or may be vacant; since anything may coexist with anything, any distribution of occupancy and vacancy is possible; and there are more than continuum many such distributions.” (Lewis 1986, p143)
Both Pruss and Lewis are making metaphysical modal claims about possibility. I suggest that for the Bayesian these are the wrong kinds of claims. General metaphysical claims of potentially or 'actually' manifest higher cardinalities are inert with respect to the decision making of finite agents, and are thus irrelevant to the pursuit of rationality whatever other interest they may hold.
I suggest that in a Bayesian account Ω should be viewed as the set of all worlds—all states of being, that can be countenanced by an agent, and which further therefore are necessarily compatible with those things the agent assumes as axiomatic. The elements of Ω—possible worlds—are those abstractions of worlds it is possible for an agent to rationally believe correspond to the actual world. In particular they are definable; if an agent can hold a belief about a possible-world, than the agent must be able to reference it, either by naming it, or by giving it a unique description6. But, whatever definitional or naming convention is chosen, only countably many objects may be referenced by an agent; since the names or definitions may be enumerated, say by counting them off in 'dictionary order'. Further if “a proposition is a set of possible worlds” (Lewis 1986, 53), then as there are only countably many propositions, only countably many sets of possible worlds may be referenced by an agent. Thus only a countable subset of F can ever be accessible to an agent for consideration.
Therefore if P is to be evaluated by the agent (which ought to be possible since the motivation for the formalization has P embodying the agent's credence assignments, that is to say P is formed by collecting the agents credences, it is an effect, not a cause) then it cannot be defined on more than a countable subset of F.
Thus we find there are independent reasons for limiting the sizes of Ω and F, while the range of P may be as large as countably infinite. We can therefore escape Pruss's result on the relationship between regularity and the cardinalities of F and the range of P.
3. The Return Home:
Pruss shows that the desire for rationality conflicts with the endorsement of some probability space triples (Ω, F, P), depending on the cardinalities of the three entries. But I argue that in any case a real Bayesian agent cannot be concerned with more than: a countable subset of Ω, a countable subset of F, and a countable restriction of a computable P. Thus guided by a particular problem of rationality—regularity—I argue that given a plausible set of possible worlds Ω, and set of subsets for consideration F, a Bayesian agent may freely, and rationally, restrict their attention to the countable portions of the probability space about which questions can be asked, and answers can be compared. Anything beyond those restrictions is utterly inert with respect to decision making, belief formation, credence assignment, and behaviour; and is thus outside the scope of the Bayesian program and in fact outside the scope of any normative account of probability. It remains to be proven that the motivating problem—regularity—is in fact possible on this account, but we have escaped at least one daunting attempted proof of it's impossibility, and I have argued a substantive claim about the kinds of possibilities a finite Bayesianism agent can consider, and the assignments of possibility they can endorse.
de Finetti, Bruno. Initial Probabilities: A Prerequisite for Any Valid Induction. Synthese, Vol. 20,
No. 1 (Jun., 1969), pp. 2-16. Hájek, Alan. Staying Regular? Pre-print.
What Conditional Probability Could Not Be, Synthese, Vol.
137, No. 3, December 2003, pp. 273-323.
Jaynes, E.T. Probability Theory: The Logic of Science. New York: Cambridge University Press,
Lewis, David. On the Plurality of Worlds. Oxford: Basil Blackwell Ltd., 1986.
Li, Ming and Paul Vitanyi. An Introduction to Kolmorogov Complexity and its Applications.
New York: Springer-Verlag, 1997.
Pruss, Alexander. Regularity and Cardinality. (MS)
The Cardinality Objection to David Lewis's Modal Realism Philosophical Studies. An
International Journal for Philosophy in the Analytic Tradition, Vol. 104, No. 2 (May,
2001), pp. 169-178
1The preceding account is an original explanation aided by reference to Li and Vitanyi's excellent text on the subject.
2See Jaynes p201-215 for a from first principles motivation and derivation of the finite case form of Laplace's principle.
3As of this writing, Pruss now claims, on his personal website, to have found a proof of his result without relying on the axiom of choice.
4Here I assume the Church-Turing thesis.
5I will not directly address Lewis and Pruss's arguments here as they are outside the scope of this paper, but I mention them as a relevant acknowledgements of how widely philosophical opinions on this question differ, and to suggest a slightly different approach to the question
6Here I implicitly take the Laplacian Principle of Identity. If two or more possible worlds share every possible description, than with respect to the describing agent, the worlds are equivalent and cannot be coherently distinguished.