Non-Additive Measures and their Applications to Decision Theory - - PowerPoint PPT Presentation

non additive measures and their applications to decision
SMART_READER_LITE
LIVE PREVIEW

Non-Additive Measures and their Applications to Decision Theory - - PowerPoint PPT Presentation

Non-Additive Measures and their Applications to Decision Theory Jean-Yves Jaffray, LIP6 , UPMC-Paris6 2nd SIPTA Summer School Madrid, July 24-28, 2006 The decision-theoretic Approach Its specific features A representation of


slide-1
SLIDE 1

Non-Additive Measures and their Applications to Decision Theory

Jean-Yves Jaffray, LIP6 , UPMC-Paris6

2nd SIPTA Summer School – Madrid, July 24-28, 2006

slide-2
SLIDE 2

The decision-theoretic Approach

Its specific features

  • A representation of data/beliefs concerning a family of events

(e.g., subjective probabilities) is of no interest by itself

  • It only becomes interesting when it gets imbedded into a

decision model (e.g., subjective probabilities into subjective expected utility theory)

2 / 1

slide-3
SLIDE 3

Choice vs Preference

P.A.SAMUELSON, Revealed Preference Theory

  • People make choices: a choice consists in selecting a decision

dchosen from a feasible decision set D. Choices are observable (at least in principle)

  • Decision theory hypothesizes the existence of preferences:

assumed preference relation determines the preferred element dpref of decision set D

  • Choice function Ch : D → Ch(D) = dchosen reveals preference

relation when for every decision set D, dpref = dchosen .

3 / 1

slide-4
SLIDE 4

Descriptive vs. Normative Decision Theory

  • The objective of descriptive decision models is to explain

parcimoniously people’s observed choices as well as to predict accurately their future choices

  • The goal of normative decision models is to help people make

better decisions by: (i) providing guidelines for rational behavior; and (ii) being operational in real applications

  • Normative decision models generally serve as references for

building descriptive models

4 / 1

slide-5
SLIDE 5

Representation Theorems in Decision Theory

  • When possible, preferences are expressed by a decision criterion

V which reduces the comparison of two decisions a and b to that

  • f two numbers V(a) and V(b); V is a utility function.
  • A representation theorem states that a system of axioms - a list of

properties of preference relation - is sufficient (resp. necessary and sufficient, in the best cases) for the existence of a decision criterion representing

  • The axiom system generally comprises rationality axioms,

behavioral axioms and technical axioms

  • A representation theorem unveils the implicite assumptions on

which the criterion rests; they can then be tested separately (descriptive models) or discussed (normative models)

5 / 1

slide-6
SLIDE 6

Decision Making under Risk

  • Risk denotes a decision situation in which events are endowed

with extraneous probabilities Π known from the DM

  • Whether or not the DM accepts these probabilities (makes them

his own subjective probabilities) can be revealed by his choices

  • Each decision d generates a probability distribution P on the

consequence set C: P(C) = Π({d yields a consequence in C}), C ⊆ C

  • A finite range probability with support S = {ci, i ∈ I}, I finite, is

called a lottery; it is completely defined by {P({ci}), i ∈ I}

  • Preferences are assumed to depend only on these probability

distributions: is defined on P = {P}

6 / 1

slide-7
SLIDE 7

Expected Utility (EU) criterion under Risk

  • Linear utility theory (VON NEUMANN and MORGENSTERN,

1947) derives from a series of axioms (Ordering, Independence, Continuity, Dominance) the validity of the Expected Utility (EU) criterion: the utility U(P) of probability P is the mathematical expectation of a function u with respect to P, U(P) =

  • C u(c)dP
  • Case of a lottery P with support S = {ci, i ∈ I},

U(P) = n

i=1 P({ci})u(ci)

  • u : C −→ R, is the von Neumann- Morgenstern (vNM) utility
  • function. Its shape is interpreted as indicating the DM’s attitude

w.r.t. risk; e.g., u concave ⇐⇒ Risk-Aversion.

7 / 1

slide-8
SLIDE 8

Decision Making under Uncertainty

  • decisions are identified to acts,which are mappings Ω −→ C,

where Ω is the set of states of nature and events are subsets of it

  • Information concerning the events can take various forms, from

purely objective to purely subjective ones, and be more or less precise: complete ignorance, objective (resp. subjective) upper \ lower probabilities, etc.

  • In the purely subjective case, in which beliefs concerning the

events are totally implicite, the standard axiomatic model, due to L.J. SAVAGE (1954), justifies a Subjective Expected Utility (SEU) criterion

8 / 1

slide-9
SLIDE 9

Subjective Expected Utility (SEU) (I)

L.J. SAVAGE (1954)

  • In SAVAGE’s model, preferences are defined on the set of all

acts A and required to satisfy a list of axioms

  • If act fE offering prize (M, m) on E, i.e.

fE(ω) = M, ω ∈ E ; fE(ω) = m, ω E is preferred to similar act fE′, then E is declared qualitatively more probable than E′. This relation turns out to be a comparative probability which is uniquely representable by a (quantitative) probability Π

  • Given Π, decisions d generate probability distributions Pd on C,
  • n which induces a relation satisfying the vNM axioms

9 / 1

slide-10
SLIDE 10

Subjective Expected Utility (SEU) (II)

  • Since probability distributions Pd on C are ordered by relation

satisfying the vNM axioms

  • The utility V(d)) of decision d is thus

V(d) = U(Pd) =

  • C

u(c)dPd =

u(d(c))dΠ

  • or, case of a finite range

V(d) =

n

  • i=1

Π(d−1{ci})u(ci)

10 / 1

slide-11
SLIDE 11

The Sure Thing Principle (I)

This is SAVAGE’s key axiom

  • If acts a, b,a′, b′ and an event E satisfy

a|E = a′

|E ,

b|E = b′

|E ,

a|Ec = b|Ec , a′

|Ec = b′ |Ec ,

then a b ⇐⇒ a′ b′

  • In words,

Common modifications of common parts of acts should not change their ranking

11 / 1

slide-12
SLIDE 12

The Sure Thing Principle (II)

A common modification of common part of acts a and b does not change their ranking: a b ⇐⇒ a′ b′.

b , b′ a , a′ a , b a′ , b′ E Ec 12 / 1

slide-13
SLIDE 13

The ALLAIS Paradox

M.ALLAIS (1953)

  • Consider lotteries A, B, C, and D

A: 10 ke with certainty ; B: 50 ke with probability 10

11 ; nothing otherwise;

C: 10 ke with probability 11

100 ; nothing otherwise;

D: 50 ke with probability 10

100 ; nothing otherwise

  • U(A) = u(10) ; U(B) = 10

11u(50) + 1 11u(0)

U(C) = 11

100u(10) + 89 100u(0) = 11 100U(A) + 89 100u(0)

U(D) = 10

100u(50) + 90 100u(0) = 11 100U(B) + 89 100u(0)

  • EU predicts (property of means): A ≻ B ⇐⇒ C ≻ D
  • Experimental result: more than half of the subjects choose A

against B and D against C; EU is not good descriptive model!

13 / 1

slide-14
SLIDE 14

Rank Dependent Utility (RDU) under Risk

  • Rank Dependent Utility (RDU) theory (J.QUIGGIN, 1982) is

an axiomatic model which is more flexible than EU and accommodates the ALLAIS paradox.

  • Beside the vNM utility, u, RDU possesses an additional

parameter, which is a function ϕ operating on the decumulative distribution functions G(c) = P({c′ : u(c′) > u(c)})

  • ϕ is called the weighting function; ϕ is strictly increasing from

[0, 1] onto [0, 1] (thus ϕ(0) = 0 and ϕ(1) = 1).

14 / 1

slide-15
SLIDE 15

A typical weighting function

which is consistent with certainty and potential effects

potential effect certainty effect 1 1 ϕ

A

15 / 1

slide-16
SLIDE 16

Expression of the RDU criterion: the finite case

  • In EU, lottery P with support {ci, i = 1, .., n}, where

u(c1) u(c2) .. u(cn) has utility U(P) = n−1

i=1 [n j=i P({cj}) − n j=i+1 P({cj})]u(ci) +

P({cn})u(cn) = u(c1) + n−1

i=1 (n j=i+1 P({cj}))[u(ci+1) − u(ci)]

  • Operating transformation ϕ on the decumulated sums we get

V(P) = n−1

i=1 [ϕ(n j=i P({cj})) − ϕ(n j=i+1 P({cj})]u(ci) +

φ(P({cn}))u(cn) = u(c1) + n−1

i=1 ϕ(n j=i+1 P({cj}))[u(ci+1) − u(ci)]

  • The dominant behavioral pattern of the ALLAIS Paradox is now

acceptable; it requires a weighting function ϕ satisfying ϕ(10

11) < ϕ( 10

100)

ϕ( 11

100) 16 / 1

slide-17
SLIDE 17

Expression of the RDU criterion: the general case

  • In EU, the utility U(P) of probability distribution P of random

variable c can be written under equivalent form: U(P) =

  • C u(c)dP =

−∞[P((u(C) > t) − 1]dt +

0 P((u(c) > t)dt

  • Operating transformation ϕ on the decumulated sums we get

V(P) =

−∞[ϕ(P((u(c) > t)) − 1]dt +

0 ϕ(P((u(c) > t))dt

  • As we shall see later, this is a particular case of a CHOQUET

integral, where the capacity involved is a probability transform.

17 / 1

slide-18
SLIDE 18

The ELLSBERG Experiment

ELLSBERG (1961)

  • An urn contains 90 balls: 30 are red, and 60 are blue or yellow

(in unknown proportions). A ball is to be drawn at random (event R =”a Red ball is drawn”; etc...), and the DM has to compare: (i) alternatives fR and fB; (ii) alternatives fR∪Y and fB∪Y; with: fR: win M conditionally on event R; fB: win M conditionally on event B; fR∪Y: win M conditionally on event R ∪ Y; fB∪Y: win M conditionally on event B ∪ Y.

  • ELLSBERG has observed the predominant preference pattern:

fR ≻ fB and fB∪Y ≻ fR∪Y

18 / 1

slide-19
SLIDE 19

The ELLSBERG Paradox

Inconsistency of preference pattern: fR ≻ fB: and fB∪Y ≻ fR∪Y with RDU (hence with SEU)

  • If P is the subjective probability, ϕ the weighting function, and u

the utility, which is chosen such that u(0) = 0 and u(M) = 1, then V(fR) = ϕ(P(R)), V(fB) = ϕ(P(B)), V(fB∪Y) = ϕ(P(B ∪ Y)), V(fR∪Y) = ϕ(P(R ∪ Y)), hence fR ≻ fB =⇒ ϕ(P(R)) > ϕ(P(B)) =⇒ P(R) > P(B), whereas fB∪Y ≻ fR∪Y =⇒ ϕ(P(B ∪ Y)) > ϕ(P(R ∪ Y)) =⇒ P(B ∪ Y) > P(R ∪ Y) =⇒ P(B) > P(R).

  • Alternative models are required for accommodating this

dominant pattern!

19 / 1

slide-20
SLIDE 20

A closer look at the ELLSBERG Urn

  • The urn has a precise, although only partially known, content
  • There is a probability P0 on the family of events,

{∅, R, B, Y, R ∪ Y, R ∪ B, B ∪ Y, R ∪ B ∪ Y}

  • Probability P0 is known to satisfy certain constraints such as

1 3 P0(R ∪ B) 1

  • In fact, the set P of all probabilities consistent with data is

characterizable as P = {P ∈ L : P(B) = 1

3}

  • It moreover has property

P = {P ∈ L : P f}, where f = inf P∈PP (lower probability)

  • Qu: What are the general properties of lower/upper probabilities?

Are they useful for characterizing sets of probabilities? Can they be embedded into decision models under uncertainty?

20 / 1

slide-21
SLIDE 21

Capacities and lower probabilities

  • A (normalized) capacity on a measurable set (Ω, E) is a set

function satisfying v(∅) = 0, v(Ω) = 1 which is monotone, i.e. ∀A, B ∈ E, A ⊂ B ⇒ v(A) v(B)

  • A capacity is convex (or supermodular) when

∀A, B ∈ E, v(A ∪ B) + v(A ∩ B) v(A) + v(B)

  • Given a non-empty set P of probabilities on (Ω, E), its lower

probability f = infP∈PP is a capacity but generally not convex

  • The core of a capacity v is the set of all probabilities which are

greater than v: core v = {P ∈ L : ∀A ∈ E, P(A) v(A)}

  • The core of a convex capacity is non-empty; moreover

v = infP∈core vP

21 / 1

slide-22
SLIDE 22

The CHOQUET integral

  • For any measurable mapping X from (Ω, E) into R, the

CHOQUET integral of X w.r.t. capacity v is defined as

  • Ch X dv =

−∞[v(X > t)) − 1]dt +

0 v(X > t)dt

  • Mappings X and Y are comonotonic when

∀ω, ω′ ∈ Ω, [X(ω) − X(ω′)][Y(ω) − Y(ω′)] 0

  • Consider the functional I on the set X of all measurable mapping

X from (Ω, E); suppose I(1Ω) = 1; if moreover: (i) I is additive on comonotonic mappings [X, Y comonotonic ⇒ I(X + Y) = I(X) + I(Y)]; (ii) I is monotonic (X Y ⇒ I(X) I(Y)); then ∀X ∈ X, I(X) =

  • Ch X dv, where ∀A ∈ E, v(A) = I(1A)
  • Conversely, any CHOQUET integral satisfies (i) and (ii).

22 / 1

slide-23
SLIDE 23

The Comonotonic Sure Thing Principle

  • Let comonotonic step-acts f, f ′ ∈ A, satisfy f(Ei) = {ci} and

f ′(Ei) = {c′

i} for the same Ei ∈ E, i ∈ I, so that:

c1 C .. C ci C .. C cn and c′

1 C .. C c′ i C .. C c′ n.

Suppose moreover that ci0 = c′

i0 for some i0.

  • If this common consequence ci0 = c′

i0 of acts f and f ′ is modified

and replaced by another common consequence ˜ ci0 = ˜ c′

i0 with the

same C-ranks, i.e., such that ci0−1 C ˜ ci0 ci0+1 and c′

i0−1 C ˜

c′

i0 c′ i0+1, then the new acts, ˜

f and ˜ f ′ satisfy ˜ f ˜ f ⇐⇒ f f ′.

23 / 1

slide-24
SLIDE 24

Choquet Expected Utility (CEU) Theory

D.SCHMEIDLER (1989)

  • Under the Comonotonic Sure Thing Principle and additional

standard axioms (weak ordering, continuity, point-wise dominance), it can be shown that is representable by a cardinal utility function V(.). Two parameters are involved: (i) a utility u(.) : C −→ R representing preference under certainty C, which, like the vNM utility of LU theory, is cardinal; (ii) a capacity, v(.) : E −→ [0, 1], which is unique.

  • For f ∈ A, V(f) =
  • Ch u(f(.))dv =DEF

−∞[v({ω : u(f(ω)) > t}) − 1]dt +

+∞ v({ω : u(f(ω)) > t})dt,

  • Ch h(.)dv is the Choquet integral of function h(.) with respect to

capacity v(.).

24 / 1

slide-25
SLIDE 25

The Multiple Prior Model

I.GILBOA, D.SCHMEIDLER (1989)

  • Another axiomatic system leads to the representation of by a

utility V(.) of the form V(f) = infP∈P

  • u(f(.))dP,

Interpretation: the DM feels able to specify a set of subjective probabilities consistent with his information on the events, but not to identify one of them as his precise subjective probability.

  • Relation with the CEU criterion. If v is a convex capacity, then:

(i) core v =DEF {P : P ∈ L and P(F) ≧ v(F) for all F ∈ E} ∅; (ii) v = infP∈core v P, i.e., ∀E ∈ E, v(E) = inf{P(E) : P ∈ core v}; (iii)

  • Ch u(f(.))dv = infP∈P
  • u(f(.))dP.

Thus, in CEU theory, a convex capacity may be interpreted as reflecting total pessimism (also called total ambiguity aversion).

25 / 1

slide-26
SLIDE 26

Generalized EU under Regular Uncertainty (I)

J.Y.JAFFRAY (1989); F.PHILIPPE,G.DEBS,J.Y.JAFFRAY (1999) Regular uncertainty is a situation where (objective) ambiguity concerning events is characterizable by a convex lower probability f

  • Case of a finite algebra of events

Linear Utility Theory transposed to convex lower probabilities f(d) (images of f generated by decisions d) instead of probabilities, leads to Generalized EU (GEU) functional V(f(d)) =

B ϕ(B).v(mB, MB)

where: ϕ is the MOEBIUS transform of f; mB and MB are the worst and best consequences of d on B

  • Particular case: v(mB, MB) = αu(mB) + [1 − α]u(MB)

V(f(d)) = α infP∈core f EPu ◦ d + [1 − α] supP∈core f EPu ◦ d This is the HURWICZ criterion with pessimism index α

26 / 1

slide-27
SLIDE 27

Generalized EU under Regular Uncertainty (II)

  • The preceding results extend to the case of an infinite algebra of

events (in a suitable topological setting) provided the convex lower probability f is assumed to be an ∞-monotone capacity.

  • If the functional V is moreover required to be a Choquet integral

(compatibility with CEU), then again [HURWICZ criterion] V(f(d)) = α infP∈core f EPu ◦ d + [1 − α] supP∈core f EPu ◦ d

  • Furthermore, if a DM uses:

(i) the CEU criterion in any situation of uncertainty, for some capacity ˆ v; (ii) the GEU criterion under regular uncertainty; then, there is a subjective lower probability ˆ f on all events (which extends f),ˆ v = αˆ f + [1 − α] ˆ F ( ˆ F, dual of ˆ f), and the HURWICZ criterion applies.

27 / 1

slide-28
SLIDE 28

Some Remarks about Conditioning (I)

  • There are diverse forms of information acquisition.
  • Example 1. Suppose all your present information concerning a

medical problem comes from a single source: a data base. (i) You may acquire new information which will increase or alter the data base (f.i.,results of new medication tests); (ii) You may learn that only part of the data are relevant (f.i., your patients shall all be children)

  • Example 2. Consider a standard ELLSBERG urn. You have to bet
  • n the colour of a ball which has already been drawn; you have

not seen it but you have been told that is not yellow. Additional information may be that: (i) this ball is the first ball from the urn; ii) balls have been drawn with replacement (you do not know how many times) until a non-yellow ball was drawn.

28 / 1

slide-29
SLIDE 29

Some Remarks about Conditioning (II)

  • In both examples, (ii), unlike (i), does not bring general

information about the characteristics of the whole population; it tells you that you are only concerned with the characteristics a subpopulation; (i) is an instance of updating, (ii) an instance of focusing.

  • Bayesian models (EU, SEU), as well as all models involving a

unique (subjective or objective) probability (e.g., RDU) use the same conditioning rule (Bayes’ rule) in all cases.

  • There is no reason to believe that the same should be true of
  • ther models
  • Conditional decisions are not necessarily determined on the basis
  • f closed form conditioning rules

29 / 1

slide-30
SLIDE 30

Conditioning rules for multiple priors/lower probabilities (I) h-Bayesian Update Rules

I.GILBOA, D.SCHMEIDLER (1993)

  • Given unconditional preference , an act h and an event A,

conditional preference A is the h-Bayesian update if and only if fAh gAh (act fAh coincides with f on A and with h on Ac)

  • When applied with h = c∗, the most preferred consequence in C,

this leads: In the CEU model, to the DEMPSTER-SHAFER rule, v(B/A) = v(B∪Ac)−v(Ac)

1−v(Ac)

In the multiple prior model, to the Maximum Likelihood Update (MLU) rule: the set of conditional priors is composed only of the (Bayesian) conditionals of those priors which give the highest probability to event A Clearly, this rule is more suited to updating than to focusing

30 / 1

slide-31
SLIDE 31

Conditioning rules for multiple priors/lower probabilities (II) Generalized Bayes or Full Bayesian Update Rule

P.WALLEY(1981); R.FAGIN, J.HALPERN(1989)

  • In a multiple prior setting, ”Full Bayesian” expresses the fact

that the set of conditional priors consists of all the (Bayesian) conditionals of the unconditional priors

  • It implies the following conditional rule for lower probabilities

v(B/A) =

v(B∩A) v(B∩A)+1−v(B∪Ac)

  • It corresponds to the following relation between unconditional

preference and conditional preference A : for any act f, there is a unique constant act h such that fAh ∼ h; then, A is induced from the requirement that f ∼A h Clearly, this rule is more suited to focusing than to updating

31 / 1

slide-32
SLIDE 32

Dynamic Consistency

The strategy preferred in A generates a sub-strategy with root B which is itself the strategy preferred in B

Up Down Down Up

20 25 10

A B

32 / 1

slide-33
SLIDE 33

Consequentialism

Preferences in the subtree with root B do not depend on the rest of the tree (structure; data)

Up Down Down Up

20 25 10

A B

10

33 / 1

slide-34
SLIDE 34

MIN+MAX as a particular case of HURWICZ’s criterion

HURWICZ criterion with pessimism index α V(f(d)) = α inf

P∈core f EPu ◦ d + [1 − α]

sup

P∈core f

EPu ◦ d Assume, complete ignorance: core f = L; α = 1

2; u(c) = 2c

Thus U(d) = min

ω∈Ω d(ω) + max ω∈Ω d(ω)

34 / 1

slide-35
SLIDE 35

Incompatibility between Consequentialism and Dynamic Consistency when Preference is Non-EU

criterion: MIN + MAX. If the criterion is used both in A and in B : Consequentialism but not Dynamic Consistency If the criterion is used only in A and induces choice in B : Dynamic Consistency but not Consequentialism

Up Down Down Up

20 25 10

A B

10

35 / 1