[PPT] - Exchangeability and predictive inference Using a special type of PowerPoint Presentation

SLIDE 1

Exchangeability and predictive inference

Using a special type of symmetry Gert de Cooman

SYSTeMS Research Group Ghent University gert.decooman@UGent.be http://users.UGent.be/~gdcooma

Second SIPTA School on Imprecise Probabilities 25 July 2006

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 1 / 70

SLIDE 2

Today’s main topic

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 2 / 70

SLIDE 3

Predictive inference

The general problem

Very important problem in statistics, and in science in general: Consider a system. Make a number of observations about the system. Use these observations to make inferences or predictions about the next observations. = PREDICTIVE INFERENCE

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 3 / 70

SLIDE 4

Predictive inference

Formalising the problem

We envisage making N observations X1, . . . , XN. Since, before making the k-th observation, we don’t necessarily know the value of Xk, we call Xk a random variable. We assume that all random variables Xk assume values in the same finite set X . After making n observations X1 = x1,...,Xn = xn, make inferences/predictions for the remaining n′ = N −n random variables Xn+1,...,XN.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 4 / 70

SLIDE 5

Predictive inference

The fundamental reductive assumption

We assume that the mechanism that produces the observations Xk is essentially stationary or time-invariant. More precisely: we assume that the order in which the

bservations are observed is of no relevance (to the predictions).

This is a special assumption of symmetry underlying the

bservations, called exchangeability [de Finetti, 1937].

This assumption reduces the very difficult general problem to a solvable special case, and is quite useful in practice.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 5 / 70

SLIDE 6

Predictive inference

An example: smoking and lung cancer

Consider a general population of people, both smokers and non-smokers. We make observations by selecting people at random, determine whether they are smokers or not, and whether they have lung cancer, or develop it during the year after the selection. The possible values for the observations are X = {S−L,S−NL,NS−L,NS−NL} where

S−L means ‘smoker that has or develops lung cancer’ S−NL means ‘smoker that does not have or develop lung cancer’ NS−L means ‘non-smoker that has or develops lung cancer’ NS−NL means ‘non-smoker that does not have or develop lung cancer’.

We are interested in the probability that a smoker selected at random from the population will have or develop lung-cancer. We assume that the order in which the people are selected at random from the population is of no importance.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 6 / 70

SLIDE 7

Modelling symmetry

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 7 / 70

SLIDE 8

An example

Tossing a coin

I am going to toss a coin in the next room. How do you model your information (beliefs) about the outcome? Situation A You have seen and examined the coin, and you believe it is symmetrical (not biased). Situation B You have no information about the coin, it may be heavily loaded, it may even have two heads or two tails. In Situation A, there is information that the phenomenon described is invariant under permutation of heads and tails: Evidence of symmetry. In Situation B, your information (none) is invariant under permutation of heads and tails: Symmetry of evidence.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 8 / 70

SLIDE 9

Modelling the available information

We want a model for the available information or evidence: a belief model.

◮ In Situation A, the belief model should reflect that there is evidence

f symmetry.

◮ In Situation B, the evidence is invariant under permutations of

heads and tails, so the belief model should be invariant as well.

Since the available information is different in both situations, the corresponding belief models should be different too! Belief models should be able to capture the difference between ‘symmetry of evidence’ and ‘evidence of symmetry’. This is not the case for Bayesian probability models.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 9 / 70

SLIDE 10

What are we going to do?

Present a more general class of belief models, of which the Bayesian belief models constitute a special subclass. Explain how to model aspects of symmetry for such general belief models, and in particular:

◮ symmetry of evidence, ◮ evidence of symmetry.

Argue that both aspects are different in general, but coincide for Bayesian belief models. Being able to deal with natural symmetries is often quite useful in applications, and is of fundamental theoretical importance.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 10 / 70

SLIDE 11

More general belief models

Accepting gambles

Consider a random variable X that may assume values x in a set X : the actual value of X is unknown to you. How do you model the available information about X? Your beliefs about X lead you to certain behaviour: accepting or rejecting gambles on the value of X.

Definition

A gamble on X is a bounded map f : X → R. The set of all gambles on X is denoted by L (X ), A gamble f associates with any possible value x of X a corresponding reward f(x), which may be negative. If you accept a gamble f, this means that after determining the actual value x of X, you will get the reward f(x).

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 11 / 70

SLIDE 12

Accepting gambles

An example: coin tossing

X is the outcome of my tossing a coin. X = {h,t} if you accept the gamble f with f(h) = −1 and f(t) = 2 this means that you are willing to engage in the following transaction:

◮ we determine the outcome of the toss; ◮ you win 2 if the outcome is t and you lose 1 if it is h.

If you think the coin is fair, you will accept f. You will not accept g with g(h) = −10 and g(t) = 1 unless you are quite sure that the outcome will be t. You will always accept h with h(h) = h(t) = 5.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 12 / 70

SLIDE 13

More general belief models

Sets of desirable gambles

We collect all the gambles you accept in a set D of desirable gambles. Such as set should satisfy the following rationality requirements:

Definition

A set of acceptable gambles D is called coherent if

D1. if supf < 0 then f ∈ D [avoiding sure loss]
D2. if f ≥ 0 then f ∈ D [accepting sure gains]
D3. if f ∈ D and λ ≥ 0 then λf ∈ D [scale invariance]
D4. if f ∈ D and g ∈ D then f +g ∈ D [combination]

So D should be a convex cone including the first orthant and containing no uniformly negative gambles.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 13 / 70

SLIDE 14

More general belief models

An example: coin tossing

h t 1 1

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 14 / 70

SLIDE 15

More general belief models

An example: coin tossing

h t 1 1

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 14 / 70

SLIDE 16

More general belief models

An example: coin tossing

h t 1 1

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 14 / 70

SLIDE 17

Equivalent other belief models

Lower and upper previsions

The lower prevision P(f) of a gamble f is defined as P(f) = sup{µ : f − µ ∈ D}, the supremum acceptable price for buying f. The upper prevision P(f) of a gamble f is defined as P(f) = inf{µ : µ −f ∈ D}, the infimum acceptable price for selling f. Observe that P(f) = −P(−f) [conjugacy].

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 15 / 70

SLIDE 18

Coherence

For lower and upper previsions

Theorem

A lower prevision P: L (X) → R is coherent if and only if

LP1. P(f) ≥ inff [avoiding sure loss]
LP2. P(λf) = λP(f) if λ ≥ 0 [non-negative homogeneity]
LP3. P(f +g) ≥ P(f)+P(g) [super-additivity]

For a subset A of X with indicator IA P(A) := P(IA) is called the lower probability of A.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 16 / 70

SLIDE 19

A special case

Bayesian belief models

When P and P coincide everywhere, we have a precise prevision P = P = P. P is then a linear functional that is monotone and normalised

P1. P(λf + µg) = λP(f)+ µP(g) for all real λ and µ [linearity]
P2. if f ≥ 0 then P(f) ≥ 0 [monotonicity]
P3. P(1) = 1 [normalisation]

For a subset A of X , P(A) := P(IA) is the probability of A. P restricted to events is a finitely additive probability measure. P(f) is the associated expectation of f: P(f) =

f dP.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 17 / 70

SLIDE 20

Equivalent other belief models

Three mathematically equivalent models ւ D P(·) M D {f : P(f) ≥ 0} {f : (∀P ∈ M )P(f) ≥ 0} P(·) max{s: ·−s ∈ D} min{P(·): P ∈ M } M {P: (∀f ∈ D)P(f) ≥ 0} {P: (∀f)P(f) ≥ P(f)}

Table: Bijective relationships between the equivalent models: coherent sets

f desirable gambles D, coherent lower previsions P on L (X ), and closed

convex sets of previsions M

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 18 / 70

SLIDE 21

Aspects of symmetry

for general belief models

How to model, in terms of our general belief models:

◮ symmetry of evidence, ◮ evidence of symmetry.

In mathematics (geometry, topology, linear algebra) symmetry is considered to be invariance under certain transformations.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 19 / 70

SLIDE 22

Monoids of transformations

Transformations and permutations

A transformation T of X is a map from X to itself, i.e., T : X → X : x → Tx. A permutation π of X is a transformation of X that is onto and

ne-to-one.

The identity map idX , defined by idX x = x, is a permutation. Consider a monoid T of transformations T of X (not necessarily permutations), i.e.,

◮ idX belongs to T ; ◮ if T and S both belong to T then so does TS := T ◦S. Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 20 / 70

SLIDE 23

Examples of transformations

A transformation

X X

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 21 / 70

SLIDE 24

Monoids of transformations

Transformations and permutations

A transformation T of X is a map from X to itself, i.e., T : X → X : x → Tx. A permutation π of X is a transformation of X that is onto and

ne-to-one.

The identity map idX , defined by idX x = x, is a permutation. Consider a monoid T of transformations T of X (not necessarily permutations), i.e.,

◮ idX belongs to T ; ◮ if T and S both belong to T then so does TS := T ◦S. Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 22 / 70

SLIDE 25

Examples of transformations

A permutation

X X

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 23 / 70

SLIDE 26

Monoids of transformations

Transformations and permutations

A transformation T of X is a map from X to itself, i.e., T : X → X : x → Tx. A permutation π of X is a transformation of X that is onto and

ne-to-one.

The identity map idX , defined by idX x = x, is a permutation. Consider a monoid T of transformations T of X (not necessarily permutations), i.e.,

◮ idX belongs to T ; ◮ if T and S both belong to T then so does TS := T ◦S. Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 24 / 70

SLIDE 27

Examples of transformations

Identity map

X X

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 25 / 70

SLIDE 28

Monoids of transformations

Transformations and permutations

A transformation T of X is a map from X to itself, i.e., T : X → X : x → Tx. A permutation π of X is a transformation of X that is onto and

ne-to-one.

The identity map idX , defined by idX x = x, is a permutation. Consider a monoid T of transformations T of X (not necessarily permutations), i.e.,

◮ idX belongs to T ; ◮ if T and S both belong to T then so does TS := T ◦S. Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 26 / 70

SLIDE 29

Monoids of transformations

Lifting

Transformations T act on elements x of X , but we are also interested in the corresponding transformations T that act on gambles f on X . For any gamble f, define the new gamble Tf := f ◦T by lifting (Tf)(x) := f(Tx). For any functional Λ on L (X ), define the new functional TΛ := Λ◦T by lifting yet again (TΛ)(f) := Λ(Tf) = Λ(f ◦T).

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 27 / 70

SLIDE 30

Monoids of transformations

An example: coin tossing

X = {h,t} and π is a permutation of X such that π(h) = t and π(t) = h. Consider the gamble f(h) = −1 and f(t) = 2, then for πf: πf(h) = f(π(h)) = f(t) = 2 πf(t) = f(π(t)) = f(h) = −1. Consider a set of desirable gambles D, its transformation is given by πD := {πf : f ∈ D}. Observe that the corresponding lower prevision is πP: sup{µ : f − µ ∈ πD} = sup{µ : πf − µ ∈ D} = P(πf) = πP(f)

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 28 / 70

SLIDE 31

Monoids of transformations

An example: coin tossing

h t 1 1 f πf π

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 29 / 70

SLIDE 32

Monoids of transformations

An example: coin tossing

X = {h,t} and π is a permutation of X such that π(h) = t and π(t) = h. Consider the gamble f(h) = −1 and f(t) = 2, then πf(h) = f(π(h)) = f(t) = 2 πf(t) = f(π(t)) = f(h) = −1. Consider a set of desirable gambles D, its transformation is given by πD = {πf : f ∈ D}. Observe that the corresponding lower prevision is πP: sup{µ : f − µ ∈ πD} = sup{µ : πf − µ ∈ D} = P(πf) = πP(f)

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 30 / 70

SLIDE 33

Monoids of transformations

An example: coin tossing

h t 1 1 D

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 31 / 70

SLIDE 34

Monoids of transformations

An example: coin tossing

h t 1 1 πD

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 31 / 70

SLIDE 35

Monoids of transformations

An example: coin tossing

X = {h,t} and π is a permutation of X such that π(h) = t and π(t) = h. Consider the gamble f(h) = −1 and f(t) = 2, then πf(h) = f(π(h)) = f(t) = 2 πf(t) = f(π(t)) = f(h) = −1. Consider a set of desirable gambles D, its transformation is given by πD = {πf : f ∈ D}. Observe that the corresponding lower prevision is πP: sup{µ : f − µ ∈ πD} = sup{µ : πf − µ ∈ D} = P(πf) = πP(f)

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 32 / 70

SLIDE 36

Weak invariance of belief models

Definition

A coherent belief model is called weakly T -invariant if the following equivalent conditions are satisfied:

W1. TD ⊆ D for all T ∈ T ;
W2. TP ≥ P for all T ∈ T ;
W3. TM ⊆ M for all T ∈ T .

A precise prevision is weakly T -invariant iff TP = P, or equivalently P(A) = P(T−1(A)) for all A ⊆ X and all T in T . This is the usual definition for invariance of a (probability) measure.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 33 / 70

SLIDE 37

Weak invariance of belief models

An example: coin tossing and weak permutation invariance

h t 1 1

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 34 / 70

SLIDE 38

Weak invariance of belief models

An example: coin tossing and weak permutation invariance

h t 1 1

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 34 / 70

SLIDE 39

Weak invariance of belief models

Observations

Weak invariance states that belief models are symmetrical, it captures ‘symmetry of evidence’. For any monoid T there always are weakly T -invariant coherent lower previsions. The vacuous lower prevision given by Dv = {f : f ≥ 0} Pv(f) = inf

x∈X f(x)

Mv = the set of all precise previsions is the only coherent belief model that is weakly invariant with respect to all transformations of X . It models complete ignorance.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 35 / 70

SLIDE 40

Strong invariance of belief models

Definition

How can we model that we believe there is symmetry, characterised by a monoid T , behind the random variable X? Consider a gamble f and its transform Tf. Because of the symmetry, you should be willing to exchange f for Tf and vice versa: f −Tf ∈ D and Tf −f ∈ D. Define DT := {Tf −f : f ∈ L (X ),T ∈ T }.

Definition

A coherent belief model is called strongly T -invariant if the following equivalent conditions are satisfied:

S1. DT ⊆ D;
S2. P(f −Tf) = 0 for all T ∈ T and f ∈ L (X );
S3. All precise previsions in M are (weakly) T -invariant.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 36 / 70

SLIDE 41

Strong invariance of belief models

An example: coin tossing

X = {h,t} and T = {idX ,π}. Observe that DT = {(f(h)−πf(h),f(t)−πf(t)): f ∈ L (X )} = {(f(h)−f(t),f(t)−f(h)): f ∈ L (X )} = {(x,−x): x ∈ R} The only strongly permutation invariant belief model is the precise model that assigns probability 1/2 to both h and t

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 37 / 70

SLIDE 42

Strong invariance of belief models

An example: coin tossing

h t 1 1 DT

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 38 / 70

SLIDE 43

Strong invariance of belief models

Observations

Strong invariance captures ‘evidence of symmetry’. Strong invariance implies weak invariance: 0 = P(Tf −f) ≤ P(Tf)−P(f). For precise previsions, strong and weak invariance coincide: 0 = P(f −Tf) = P(f)−P(Tf). Bayesian models cannot distinguish between ‘evidence of symmetry’ and ‘symmetry of evidence’. A coherent lower prevision is strongly invariant if and only if it is a lower envelope of invariant linear previsions.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 39 / 70

SLIDE 44

Strong invariance of belief models

Observations

No coherent belief model is strongly invariant with respect to all transformations of X . For any monoid T there are not always strongly T -invariant coherent lower previsions (or equivalently precise previsions). This is the problem of the existence of T -invariant means, or of the amenability of a monoid T : necessary and sufficient condition is sup

x∈X

n

∑

k=1

[fk(x)−Tkfk(x)]

≥ 0

for any n ≥ 0, Tk ∈ T , fk ∈ L (X ). All this fascinating material is covered in much detail in [De Cooman and Miranda, 2006] and to some extent also in [Walley, 1991].

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 40 / 70

SLIDE 45

Strong permutation invariance

A special case

Let X be a finite set and let P be a (finite) group of permutations π of X , i.e. a monoid such that

◮ for all π in P there is some inverse ϖ ∈ P such that

π ◦ϖ = ϖ ◦π = idX .

an event A ⊆ X is P-invariant if πA = {πx: x ∈ A} = A for all π in P. Fact: the smallest P-invariant sets (atoms) constitute a partition

f X :

[x]P := {πx: π ∈ P}, and AP := {[x]P : x ∈ X } is the set of all P-invariant atoms.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 41 / 70

SLIDE 46

Strong permutation invariance

Invariant atoms

X A1 A2 A3 A4 A5 A6 AP A1 A2 A3 A4 A5 A6

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 42 / 70

SLIDE 47

Strong permutation invariance

Fundamental theorem

Theorem

A coherent lower prevision P on L (X ) is strongly P-invariant if and

nly if it has the following form:

P(f) = Q(Pu(f|·)) where Q is any coherent lower prevision on L (AP). Pu(f|·) is a gamble on AP, whose value in any invariant atom A ∈ AP is given by Pu(f|A) = 1 |A| ∑

x∈A

f(x), so Pu(·|A) is the precise prevision whose probability mass is distributed uniformly over A.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 43 / 70

SLIDE 48

Exercises on symmetry

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 44 / 70

SLIDE 49

Exercises on symmetry

Problem 1

Consider a space with two elements: X = {a,b}.

1

Show that any linear prevision on this space can be written as Pα(f) = αf(a)+(1−α)f(b) for some α ∈ [0,1]. Actually α = Pα({a}) and 1−α = Pα({b}).

2

Show that for any gamble f: f = f(a)+[f(b)−f(a)]I{b} = f(b)+[f(a)−f(b)]I{a}

3

Show that all coherent lower previsions P on L (X ) are linear-vacuous mixtures: there are α and ε in [0,1] such that P(f) = εPα(f)+(1−ε)minf = ε[αf(a)+(1−α)f(b)]+(1−ε)min{f(a),f(b)}. [Hint: Let P({a}) = εα and P({b}) = ε(1−α)].

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 45 / 70

SLIDE 50

Exercises on symmetry

Problem 2

Consider a space with two elements: X = {a,b} and the set P of all permutations of X .

1

What are the elements of P?

2

What are the invariant atoms?

3

Show that all weakly P-invariant coherent lower previsions P on L (X ) are given by P(f) = εP 1

2 (f)+(1−ε)minf

= ε f(a)+f(b) 2 +(1−ε)min{f(a),f(b)}. for some ε in [0,1].

4

Show that P 1

2 is the only strongly P-invariant coherent lower

prevision on L (X ).

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 46 / 70

SLIDE 51

Exercises on symmetry

Problem 3

Consider casting a die: X = {1,2,3,4,5,6} and suppose there is evidence of symmetry between all even outcomes, and between all

dd outcomes: you have reason not to distinguish between 2, 4 and 6
n the one hand, and 1, 3 and 5 on the other hand. In other words, the

invariant atoms are {1,3,5} and {2,4,6}.

1

Characterise all the strongly invariant coherent lower previsions for this type of symmetry.

2

Characterise all the strongly invariant precise previsions for this type of symmetry. [Hint: use the results of Problem 1, and the Fundamental Theorem on Strong Permutation Invariance]

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 47 / 70

SLIDE 52

Exchangeability

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 48 / 70

SLIDE 53

Exchangeability

Basic notation

Consider N random variables Xk taking values in a finite set X . Let X N = ×N

k=1Xk be the set of possible values x = (x1,...,xN) for

the random variable X = (X1,...,XN). The available information about the value of X will be described by a coherent lower prevision P on L (X N). A gamble f on X N maps possible values x of X to real rewards f(x).

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 49 / 70

SLIDE 54

Exchangeability

Definition

Now assume you have evidence that the Xk are generated by the same process at different times, and that the time at which we

bserve the process is of no consequence.

So you have evidence that there is permutation symmetry between the times k ∈ {1,...,N}. Consider a permutation π of {1,...,N} and use lifting to turn it into a permutation π of X N: πx = π(x1,...,xN) = (xπ(1),...,xπ(N)). The evidence of symmetry requires that we should use a model P

n L (X N) that is strongly invariant with respect to the set PN

X all

such permutations, i.e., exchangeable.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 50 / 70

SLIDE 55

Fundamental theorem

Preparatory work

We shall apply our general fundamental theorem for strong permutation invariance to this special case.

Jump to theorem

This will lead to a generalisation of de Finetti’s representation theorem for finite exchangeable sequences to IP . In order to apply the theorem, we need to take a closer look at

◮ the invariant atoms of X N, given by

[x] :=

πx: π ∈ PN

X

,

◮ the uniform previsions Pu(·|[x]). Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 51 / 70

SLIDE 56

Characterising the invariant atoms

An example: coin tossing

X = {h,t} and sequence of N = 4 tosses: x = (h,h,t,h). Elements of [x]: (h,h,h,t),(h,h,t,h),(h,t,h,h),(t,h,h,h). [x] is completely characterised by the fact that there are three h and one t. With [x] there corresponds a count vector m = (mh,mt) = (3,1) in the set N 4

X =

(mh,mt) ∈ NX : mh +mt = 4
.

[x] contains ν(m) elements: ν(m) = N m

=

4 3 1

= 4!

3!1! = 4.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 52 / 70

SLIDE 57

Characterising the invariant atoms

General case

Count how many times a category z ∈ X has occurred in a sample x = (x1,...,xN) of length N: mz = Tz(x) := |{k ∈ {1,...,N}: xk = z}| This leads to count vector m = TX (x), whose components mz = Tz(x), z ∈ X , count the number of times that the category z has been observed during the N observations. The set of possible count vectors after N observations: N N

X :=

m ∈ NX : ∑

z∈X

mz = N

,

Counting map TX maps sample sequences x ∈ X N of length N to corresponding count vectors m = TX (x) in N n

X .

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 53 / 70

SLIDE 58

Characterising the invariant atoms

General case

x and y belong to the same invariant atom (smallest invariant subset) if and only if they have the same count vector: TX (x) = TX (y). TX is a one-to-one correspondence between invariant atoms and count vectors in N N

X .

We shall identify an invariant atom by its count vector m: x ∈ [m] ⇔ TX (x) = m. The invariant atom [m] has ν(m) = N m

=

N! ∏z∈X mz! elements.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 54 / 70

SLIDE 59

Characterising the uniform previsions

The multiple hyper-geometric distribution

Consider the precise prevision Pu(·|m) whose probability mass is uniformly distributed over the invariant atom with count vector m: Pu(f|m) := 1 ν(m) ∑

x∈[m]

f(x) = ∑

x∈X N

f(x)pu(x|m)= MuHyN

X (f|m).

where the probability mass is given by: pu(x|m) =    1 ν(m) = ∏z∈X mz! N! if x ∈ [m] if x ∈ [m]. Imagine an urn with N balls of possible types X , of composition m, meaning that there are mz balls of type z ∈ X . pu(x|m) is the probability of drawing, without replacement, the sequence of balls x = (x1,...,xN) from this urn with composition m: it is/has the multiple hyper-geometric distribution.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 55 / 70

SLIDE 60

Characterising the uniform previsions

An example: drawing balls from an urn

Consider an urn with N = 5 balls: 2 red, 2 green, and 1 yellow. X = {r,g,y} and m is such that mr = 2, mg = 2 and my = 1. Probability of drawing the sequence

◮ (r,r,g,y,r): zero ◮ (r,r,g,y,g):

1 ν(m) = mr!mg!my! N! = 2!2!1! 5! = 2·1·2·1·1 5·4·3·2·1 = 1 30

◮ another way of seeing this: take out the balls one by one

2 5 · 1 4 · 2 3 · 1 2 · 1 1 = 1 30

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 56 / 70

SLIDE 61

Exchangeability

Representation Theorem for finite exchangeable sequences

Use the Fundamental Theorem for Strong Permutation Invariance to get:

Jump to theorem

Theorem

A coherent lower prevision PN

X on L (X N) is exchangeable if and only

if there is some coherent lower prevision QN

X on L (N N X ) such that

PN

X (f) = QN X (MuHyN X (f|·)).

PN

X is the belief model for the values of the observations random

variable X. QN

X is the corresponding belief model for the values of the count

random variable TX (X).

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 57 / 70

SLIDE 62

Exchangeability

Meaning of the Representation Theorem

Sampling without replacement from an urn with known composition m is an exchangeable process. Sampling without replacement from an urn with unknown composition, described by the lower prevision QN

X on the set of

possible compositions N N

X , is still an exchangeable process.

Any exchangeable process can be interpreted as sampling (without replacement) from an urn with unknown composition, where the information about the composition is given by some lower prevision QN

X .

de Finetti’s case: when QN

X is a precise prevision QN X .

All the information is in the count lower prevision QN

X , the counts are a

sufficient statistic.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 58 / 70

SLIDE 63

Predictive inference

Basic setup

We have n∗ exchangeable random variables X1, . . . , Xn∗ in X , leading to a vector X∗ = (X1,...,Xn∗) with possible values z∗ = (z1,...,zn∗) ∈ X n∗ with lower prevision Pn∗

X on L (X n∗) and count lower prevision

Qn∗

X on L (N n∗ X ).

We have n < n∗ observations X1 = x1, . . . , Xn = xn, i.e., we know that X = x = (x1,...,xn) with TX (x) = m ∈ N n

X .

Predictive inference: we are interested in the n′ = n∗ −n remaining, as yet unobserved variables (Xn+1,...,Xn∗) assuming values in X n′, and we want to know the predictive (updated) lower prevision Pn′

X (·|x) on L (X n′).

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 59 / 70

SLIDE 64

Predictive inference

Applying the Generalised Bayes Rule

Consider a gamble f on L (X n′), then coherence (GBR) requires that the predictive lower prevision Pn′

X (f|x) is the solution µ of the

equation PN

X

I{x}×X n′(f − µ)
= 0.

Use the Representation Theorem to see that Pn∗

X (h) = Qn∗ X (MuHyn∗ X (h|·)).

Solving this equation explicitly for µ is quite involved, but in principle you have all the necessary tools and information from the lectures on the first day. I won’t do it here, but you are welcome to try it on your own. Given in full detail in [De Cooman et al., 2006].

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 60 / 70

SLIDE 65

Predictive inference

The updating formula

Pn′

X (f|x) = Qn′ X (MuHyn′ X (f|·)|m),

where for gambles g on N n′

X , Qn′ X (g|m) is the solution α of:

Qn∗

X (Lm(g−α)) = 0 (another GBR)

(Lmg)(m∗) =

Lm(m∗)g(m∗ −m)

if m∗ ≥ m

therwise

and where the likelihood Lm(m∗) = ν(m∗ −m) ν(m) gives the probability of drawing a sample of n balls with composition m from an urn with n∗ balls of composition m∗.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 61 / 70

SLIDE 66

Predictive inference

General observations

The random variables Xn+1, . . . , Xn∗ remain exchangeable after the observation x: post-data exchangeability. Pn′

X (·|x) only depends on the data x through the likelihood function

Lm = LTX (x). This type of predictive inference:

◮ is coherent ◮ satisfies the likelihood principle ◮ has counts as a sufficient statistic: dependence on the data x only

through the count vector m = TX (x)

◮ satisfies the stopping rule principle.

If the unknown composition of the urn is described by QN

X , then

after drawing a sample of n balls with composition m from it, the composition of the n′ balls remaining in the urn is described by Qn′

X (·|m).

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 62 / 70

SLIDE 67

Predictive inference

The precise case

Qn∗

X is a precise prevision Qn∗ X on N n∗ X with mass function qn∗ X .

Qn′

X (·|m) is a precise prevision Qn′ X (·|m) given by Bayes’ rule:

Qn′

X (g|m) = Qn∗ X (Lmg)

Qn∗

X (Lm)

where Qn∗

X (Lmg) =

∑

m∗∈N n∗

X

(Lmg)(m∗)qn∗

X (m∗)

=

∑

m∗∈N n∗

X

ν(m∗ −m) ν(m∗) g(m∗ −m)qn∗

X (m∗)

= ∑

m′∈N n′

X

ν(m′) ν(m+m′)g(m′)qn∗

X (m+m′).

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 63 / 70

SLIDE 68

Predictive inference

The precise case

So in particular Qn∗

X (Lm) = ∑ m′∈N n′

X

ν(m′) ν(m+m′)qn∗

X (m+m′).

For the updated mass function qn′

X (·|m) on N n′ X :

qn′

X (m′|m) =

ν(m′) ν(m+m′) qn∗

X (m+m′)

Qn∗

X (Lm)

. This is a Rule of Succession. If you can do it for the precise case, you can (usually) do it for the imprecise case by taking lower envelopes.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 64 / 70

SLIDE 69

Predictive inference

Dirichlet–Multinomial priors

It is a good exercise to do this for the Dirichlet–Multinomial priors: qn∗

X (m∗) = DiMun∗ X (m|α

α α) := ∏

x∈X

m∗

x +αx −1

m∗

x

/

n∗ +∑x∈X αx −1 n∗

to get the remarkably simple updating formula:

qn′

X (m′|m) = DiMun′ X (m′|α

α α +m), where α α α is an X -tuple with components αx, x ∈ X . The ID(M)M-s is obtained by taking lower envelopes over all α α α such that ∑x∈X αx = s.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 65 / 70

SLIDE 70

Exercises on exchangeability

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 66 / 70

SLIDE 71

Exercises on exchangeability

Problem 4

Consider the problem of immediate mass prediction in a space X with two possibilities: success and failure. You make n observations, s of which give a success, and n−s of which a failure: m = (s,n−s). I want you to calculate the probability qn′

X (m′|m) of a success on

the next observation, so n∗ = n+1, n′ = 1 and m′ = (1,0). For this you need the prior mass function qn+1

X

n

N n+1

X

= {(k,n+1−k): k = 0,1,...,n}.

1

Show that this is given by (s+1)σ (s+1)σ +(n−s+1)φ , where σ = qn+1

X (s+1,n−s)

φ = qn+1

X (s,n−s+1).

2

What happens when σ = φ (Bayes–Laplace), or when your prior model is vacuous?

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 67 / 70

SLIDE 72

Exercises on exchangeability

Problem 5

Consider a sequence of random variables Y1, . . . , Yn, . . . that are mutually independent, assume values in the same set X , and all have the same mass function θ θ θ: θz is the probability that Yk = z for all z ∈ X and all k; θ θ θ is an element of the X -simplex ΣX =

θ

θ θ ∈ RX

+ : ∑z∈X θz = 1

.

1

For any n ≥ 1 and any gamble f on X n, give the prevision Mnn

X (f|θ

θ θ). [is a multinomial]

2

Show, using the Representation Theorem, that Y1, . . . , Yn are exchangeable, and derive the corresponding counting prevision CoMnn

X (g|θ

θ θ) for any gamble g on N n

X .

3

For any coherent lower prevision RX on L (ΣX ), show that the lower prevision Pn

X (f) = RX (MnX (f|·)) is still exchangeable, and

give the corresponding counting lower prevision QN

X .

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 68 / 70

SLIDE 73

For further reading I

G. de Cooman and E. Miranda.

Symmetry of models versus models of symmetry. In W. L. Harper and G. R. Wheeler, editors, Probability and Inference: Essays in Honor of Henry E. Kyburg, Jr. King’s College Publications, 2006. Accepted for publication.

G. de Cooman, E. Miranda, and E. Quaeghebeur.

Exchangeable lower previsions. 2006. In preparation.

B. de Finetti.

La pr´ evision: ses lois logiques, ses sources subjectives. Annales de l’Institut Henri Poincar´ e, 7:1–68, 1937. English translation in [Kyburg and Smokler, 1964].

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 69 / 70

SLIDE 74

For further reading II

H. E. Kyburg Jr. and H. E. Smokler, editors.

Studies in Subjective Probability. Wiley, New York, 1964. Second edition (with new material) 1980. P . Walley. Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London, 1991.

Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 70 / 70