Tim Put
February 27th,
2012
Bayesianism's
Metaphysical Baggage
1. The Itinerary:
Bayesianism is a
system for thinking rationally about new information. It gives a
method for updating
beliefs in light of new evidence, but does not provide prior
probabilities. If we wish to use Bayesianism to reason about actual
or hypothetically actual events we require a prior probability
distribution. It is not clear that all prior probability
distributions are rational; and arbitrary prior probability
assignments, which lack any reason to choose one over another, are
unsatisfying. I will argue that the probability distributions which
can be endorsed by a finite agent commit the practising Bayesian to
an uncommon metaphysical claim on the cardinality of the set of
possible worlds.
According to the
subjective interpretation of probability used in Bayesianism, the
probability of an event occurring is a measure of the degree of
belief that the event will happen which it is rational to hold. This
'probability-of' as 'confidence-in' is conventionally termed
credence. Notably, on this account probabilities are not in general
the properties of events, but are rather relationships between an
agent attitudes and a hypothetical event.
The Bayesian
approach gives a rational method for updating credences in light of
new evidence. However it does not provide initial credences.
Bayesianism only provides a method to rationally update an already
existent set of credences, a prior probability distribution; it
cannot generate subjective probabilities from scratch. If an agent
wishes to use Bayesianism to evaluate claims in light of evidence—to
reason, then the agent must begin with some prior probability
distribution acquired without the help of Bayes.
The Bayesian
program is motivated by the desire to be rational, to reason
optimally or at least well. One rule we might rationally want our
probability assignments to cohere with is regularity, roughly: only
the impossible should be given a credence of zero, and only the
necessary a credence of one. Failing to conform to regularity opens
an agent up to a Dutch book, whereby the agent's otherwise rational
decision making leads to a certain loss.
In practice agents
are not concerned with having rational credence-value-assignments for
any and all arbitrary propositions, but are concerned with finding
rational credence assignments for specific explicitly stated
propositions, while ensuring these assignments are consistent with
the rules derived in the arbitrary case. An agent can only be
concerned about questions that can be asked. For example if a
question cannot be expressed with anything less than an infinite
symbol string, then the question in its entirety cannot be understood
by a (finite) agent, and thus the answer to the question is utterly
uninformative for that agent.
The logic of
probability, including in particular Bayesian subjective probability,
is captured in the mathematical concept of a probability space. Not
all probability spaces permit credence assignments that cohere with
regularity. I will suggest a restriction on which probability spaces
we should consider, which will allow us to retain regularity.
Further I will
argue from the theoretical limits of an agent's concern that the
exercise of Bayesianism is not compatible with any but the
aforementioned restricted set of probability spaces. Endorsing only
these acceptable probability spaces amounts to a metaphysical
commitment on the cardinality of possible worlds.
2. The Trip:
A probability space
is an ordered triple usually written as: (Ω, F, P). The set Ω is a
“sample space”, a non-empty set of possible states, for my
purposes we can call Ω the set of possible worlds. The σ-algebra F
is a subset of the powerset of Ω (which is to say: the elements of F
are subsets of Ω) the elements of which correspond to propositional
claims, e.g. the proposition that “it will rain tomorrow” is
equivalent to the claim that the actual world belongs to the subset
of worlds in the set of all possible worlds in which it rains
tomorrow, that is the actual world belongs to some certain element of
F. This interpretation is in line with, and typically credited to
Lewis:
“I identify
propositions with certain properties - namely, with those that are
instantiated only by entire possible worlds. Then if properties
generally are the sets of their instances, a proposition is a set of
possible worlds. A proposition is said to hold at a world, or to be
true at a world. The proposition is the same thing as the property
of being a world where that proposition holds; and that is the same
thing as the set of worlds where that propositions holds. A
proposition holds at just those worlds that are members of it.”
(1986, 53-54)
P is a relation taking elements of F into unique
elements (no element of F is assigned more than one probability) of
some probability assignment range. Typically P is assumed to be a
function from F to the real interval [0,1]. I will not assume that
the range of P is [0,1] in the reals, nor that it is defined on all
of F. However I will assume, in line with the Kolmogorov
axiomatization that the range of P is bound above and below by 0 and
1 respectively, and that P(Ø)=0, while P(Ω)=1. P embodies the
credence assignments of a Bayesian agent, and is different for each
agent on the subjective account of probability. We now have the
terminology requisite for a concise account1.
In probability
space terminology Bayesianism gives rules about how an an agent ought
to modify their probability assignment function—P—in light of new
evidence. In particular new evidence rules out all the otherwise
possible worlds in which the agent does not acquire the evidence in
question. This ruling-out in turn ought to change the credence
assignments given to the elements of F. Bayesianism prescribes how to
update these credence assignments in a rational way. Crucially
Bayesianism does not provide a probability assignment function.
Bayesian inference must be fed a seed function, an initial P of a
priori probability assignments. “If not supplemented by the initial
probability distribution [that is: P] over this family [that is: F],
the framework is useless; if so supplemented it is sufficient.” (de
Finetti 1969, 13)
Many attempts to
provide a priori probability distributions have been made. Perhaps
the most intuitive is the Laplacian Principle of Indifference2
which states that: given a set of mutually exclusive possible
outcomes, if we have no positive reason to expect any one outcome
over another, we should assign each possible outcome the same
credence. This is unproblematic in the finite case; if there are n
possible outcomes, we simply assign credences of 1/n to each outcome.
But if there are infinitely many possible outcomes there is not in
general a non-arbitrary uniform probability distribution. Since if
there are infinitely many alternative outcomes and we assign some
positive credence identically to each alternative, then the
probability of one of the outcomes in question obtaining, P(Ω), is
infinite and not bound by 1. The properties of any distribution on an
infinite set of disjoint sets in F which avoids infinite
probabilities will depend on the particular parametrization of the
sample space.
To be regular is to
assign a probability of zero to the empty-outcome and only to the
empty-outcome, motivated since no possible world corresponds to any
impossible event, and to assign a probability of one only to the set
of all possible worlds, motivated since whatever happens will be
something that was possible (even if it is only apparent in
retrospect). Besides this appealing motivation, one can devise Dutch
book arguments in favour of regularity. Suppose I am about to select
a random rational number between zero and one, say by throwing at a
rational number line one of those astounding darts standard in the
thought-experimentalist's kit. Further I enumerate the rationals in
[0,1] calling them x1, x2, . . . xn . . . . I offer you a series of
bets: if the dart lands exactly on xn, you owe me $2, otherwise I owe
you $1/2n. Since, against regularity you have assigned P(x)=0 for any
point x belonging to [0,1] in Q, your expected value for the nth bet
is $1/2n∙(1-0)-$2∙0 = $1/2n > 0, thus rationally you take the
bet, and similarly each subsequent bet. But one such xk will be hit,
thus you will lose the kth bet and will owe me $2, while you will win
every other bet and so I will owe you
$(1+1/2+1/4+...+1/2k-1/2k+...)=$(2-1/2k). But then on accepting the
dutch book you are guaranteed to lose $1/2k for some k. So regularity
seems necessary for rationality.
Nevertheless
regularity seems difficult to achieve. In the above Dutch book
argument the gambler has no reason to believe any one rational number
is more likely than any other, further the xn enumeration was
arbitrarily preformed without the gambler's oversight, thus any
permutation of the labels xn is informationally equivalent for the
gambler. Thus by symmetry any reordering should be given the same
probability distribution, and thus the probability distribution must
be uniform. But as we have already informally observed, the gambler
is then forced to assign to each rational point of [0,1] a
probability of zero, exactly the assignment that opened the gambler
up to Dutch booking.
From Hájek:
“In order for
there to be . . . regularity, there has to be a certain harmony
between the cardinalities of P’s domain—namely F—and P’s
range. If F is too large relative to P’s range, then a failure of
regularity is guaranteed, and this is so without any further
constraints on P. . . . Indeed, any probability function defined on
an uncountable algebra assigns probability 0 to uncountably many
propositions, and so in that sense it is very irregular. (See Hájek
2003 for proof.)” (Hájek (preprint),19)
If we wish to save
regularity, and thus rationality, we must be careful in choosing F,
and P, and by extension Ω, since our choice of Ω will inevitably
effect what options are available for F (since F is defined as
collection of subsets of Ω), and what options are suitable for the
range of P. And I will argue that in practice and interpretation, our
choice of Ω puts limitation on our choice of the range of P once we
recognize that the agents using P must belong to some world in Ω.
Again from Hájek:
“Pruss (MS) generalizes this observation. Assuming the axiom of
choice[3],
he shows that if the cardinality of Ω is greater than that of the
range of P, and this range is totally ordered, then regularity
fails: either some subset of Ω gets probability 0, or some subset
gets no probability whatsoever. The upshot is that non-totally
ordered probabilities are required to save regularity—a departure
from orthodoxy so radical that I wonder whether they deserve to be
called 'probabilities' at all.” (Hájek (preprint), 20)
So our options if we
wish to save regularity are to: restrict the cardinality of F to no
larger than the cardinality of the chosen range of P; or to expand
the cardinality of P to match a chosen F; or as Hájek suggests as a
final alternative: give up on totally ordered probabilities.
A probability space
that is not totally ordered contains propositions with well defined
probabilities which are not all comparable. In a non-totally ordered
space we may know the probability of events x and y, yet not know
which one is more probable. Since in such a space the very question
of which is more probable may not have an answer, even though they
have well defined probabilities. An agent offered a choice between
two bets: one offering $1 if x obtains, the other $1 if y obtains,
would have no means to compare the two bets, even with perfect
reasoning ability and complete probabilistic knowledge. Despite the
two bets having the same payoff, but with different odds, there is no
fact of the matter as to which is a better bet. But this seems
against all intuition about what probability is supposed to be about.
Given that the Bayesian program is to find rules for rational thought
(in particular 'credence updating'), I posit that we can dismiss
non-totally ordered 'probabilities' as simply not the object we wish
to study.
This leaves
judicious selection of F and P as our only recourse. Given some F we
may choose to define P so it has a range with a large enough
cardinality to allow regularity. But, as Bayesian agents we want to
assign some definite probability—some element of the range of P—to
each element of F presented to us. We want to be able to evaluate the
function P at arguments fϵF. Therefore, if we wish to ever use
Bayesianism, as is the motivation for its development, P must be a
computable function4.
Thus the range of P must be a set of computable numbers. But the
computable numbers are countably infinite, hence the range of P must
be at most countably infinite. Then merely from the fact that we,
along with all other plausible agents of concern, are finite, a
predetermined limit on the maximum cardinality of P arises. We thus
find the range of P has a fixed maximum, and so we are driven to the
third and final possibility for saving regularity.
Having determined
an upper bound on the cardinality of the range of P that can be
rationally postulated, the only hope to save regularity is to
restrict the cardinality of F. F is defined to be a subset of the
power set of Ω, so the cardinality of Ω puts a limit on how large F
can be. Thus if we wish to find a rational restriction on the size of
F, it will be worthwhile to see what restrictions may be placed on Ω.
In the probability
space triple Ω is supposed to be the space of possible worlds, but
what sense of possible worlds? For the Bayesian subjectivist agent, Ω
is no more than the space of worlds whose possibility the agent is
willing or able to entertain. For some Ω is rather large. Indeed for
Pruss “we have a reductio ad absurdum of the assumption that the
collection of all possible worlds forms a set” (2001, 171-2). Pruss
holds that Ω is too large to be a Cantorian set, and must in fact be
a class. Such a result leaves F ill-defined as there is no such thing
as the powerset of a class, and I find it difficult to see how a
finite agent can distinctly entertain the possibilities of an entire
proper class of states. Lewis is less extreme on the issue than
Pruss5,
but nevertheless endorses a cardinality for Ω of at least 2c:
“But it is easy
to argue that there are more than continuum many possibilities we
ought to distinguish. It is at least possible to have continuum many
spacetime points; each point may be occupied by matter or may be
vacant; since anything may coexist with anything, any distribution
of occupancy and vacancy is possible; and there are more than
continuum many such distributions.” (Lewis 1986, p143)
Both Pruss and Lewis
are making metaphysical modal claims about possibility. I suggest
that for the Bayesian these are the wrong kinds of claims. General
metaphysical claims of potentially or 'actually' manifest higher
cardinalities are inert with respect to the decision making of finite
agents, and are thus irrelevant to the pursuit of rationality
whatever other interest they may hold.
I suggest that in a
Bayesian account Ω should be viewed as the set of all worlds—all
states of being, that can be countenanced by an agent, and which
further therefore are necessarily compatible with those things the
agent assumes as axiomatic. The elements of Ω—possible worlds—are
those abstractions of worlds it is possible for an agent to
rationally believe correspond to the actual world. In particular they
are definable; if an agent can hold a belief about a possible-world,
than the agent must be able to reference it, either by naming it, or
by giving it a unique description6.
But, whatever definitional or naming convention is chosen, only
countably many objects may be referenced by an agent; since the names
or definitions may be enumerated, say by counting them off in
'dictionary order'. Further if “a proposition is a set of possible
worlds” (Lewis 1986, 53), then as there are only countably many
propositions, only countably many sets of possible worlds may be
referenced by an agent. Thus only a countable subset of F can ever be
accessible to an agent for consideration.
Therefore if P is
to be evaluated by the agent (which ought to be possible since the
motivation for the formalization has P embodying the agent's credence
assignments, that is to say P is formed by collecting the agents
credences, it is an effect, not a cause) then it cannot be defined on
more than a countable subset of F.
Thus we find there
are independent reasons for limiting the sizes of Ω and F, while the
range of P may be as large as countably infinite. We can therefore
escape Pruss's result on the relationship between regularity and the
cardinalities of F and the range of P.
3. The Return Home:
Pruss shows that
the desire for rationality conflicts with the endorsement of some
probability space triples (Ω, F, P), depending on the cardinalities
of the three entries. But I argue that in any case a real Bayesian
agent cannot be concerned with more than: a countable subset of Ω, a
countable subset of F, and a countable restriction of a computable P.
Thus guided by a particular problem of rationality—regularity—I
argue that given a plausible set of possible worlds Ω, and set of
subsets for consideration F, a Bayesian agent may freely, and
rationally, restrict their attention to the countable portions of the
probability space about which questions can be asked, and answers can
be compared. Anything beyond those restrictions is utterly inert with
respect to decision making, belief formation, credence assignment,
and behaviour; and is thus outside the scope of the Bayesian program
and in fact outside the scope of any normative account of
probability. It remains to be proven that the motivating
problem—regularity—is in fact possible on this account, but we
have escaped at least one daunting attempted proof of it's
impossibility, and I have argued a substantive claim about the kinds
of possibilities a finite Bayesianism agent can consider, and the
assignments of possibility they can endorse.
Works
Cited
de Finetti, Bruno.
Initial Probabilities: A Prerequisite for Any Valid Induction.
Synthese, Vol. 20,
No. 1 (Jun., 1969),
pp. 2-16. Hájek, Alan. Staying Regular? Pre-print.
What Conditional
Probability Could Not Be, Synthese, Vol.
137, No. 3,
December 2003, pp. 273-323.
Jaynes, E.T.
Probability Theory: The Logic of Science. New York: Cambridge
University Press,
2003.
Lewis, David. On the
Plurality of Worlds. Oxford: Basil Blackwell Ltd., 1986.
Li, Ming and Paul
Vitanyi. An Introduction to Kolmorogov Complexity and its
Applications.
New York:
Springer-Verlag, 1997.
Pruss, Alexander.
Regularity and Cardinality. (MS)
The Cardinality
Objection to David Lewis's Modal Realism Philosophical Studies. An
International
Journal for Philosophy in the Analytic Tradition, Vol. 104, No. 2
(May,
2001), pp. 169-178
1The
preceding account is an original explanation aided by reference to
Li and Vitanyi's excellent text on the subject.
2See
Jaynes p201-215 for a from first principles motivation and
derivation of the finite case form of Laplace's principle.
3As
of this writing, Pruss now claims, on his personal website, to have
found a proof of his result without relying on the axiom of choice.
4Here
I assume the Church-Turing thesis.
5I
will not directly address Lewis and Pruss's arguments here as they
are outside the scope of this paper, but I mention them as a
relevant acknowledgements of how widely philosophical opinions on
this question differ, and to suggest a slightly different approach
to the question
6Here
I implicitly take the Laplacian Principle of Identity. If two or
more possible worlds share every possible description, than with
respect to the describing agent, the worlds are equivalent and
cannot be coherently distinguished.
No comments:
Post a Comment