Interactions and dynamics: some aspects of repeated zero-sum games - - PDF document

interactions and dynamics some aspects of repeated zero
SMART_READER_LITE
LIVE PREVIEW

Interactions and dynamics: some aspects of repeated zero-sum games - - PDF document

Interactions and dynamics: some aspects of repeated zero-sum games Sylvain Sorin Laboratoire dEconom etrie, Ecole Polytechnique, 1 rue Descartes, 75005 Paris and Equipe Combinatoire, UFR 921, Universit e P. et M. Curie - Paris 6,


slide-1
SLIDE 1

Interactions and dynamics: some aspects of repeated zero-sum games

Sylvain Sorin Laboratoire d’Econom´ etrie, Ecole Polytechnique, 1 rue Descartes, 75005 Paris and Equipe Combinatoire, UFR 921, Universit´ e P. et M. Curie - Paris 6, 175 Rue du Chevaleret, 75013 Paris, France sorin@poly.polytechnique.fr

Winter School on Complex Systems December 9 -13, 2002 Ecole Normale Sup´ erieure de Lyon

1

slide-2
SLIDE 2

Contents 1 Introduction 3 1.1 Zero-sum games . . . . . . . . . . . . . . . 3 1.2 Repetition, information and interaction . . 3 1.3 Evaluation: asymptotic approach, uniform approach . . . . . . . . . . . . . . . . . . . 3 2 Stochastic games 5 2.1 Description . . . . . . . . . . . . . . . . . 5 2.2 Results . . . . . . . . . . . . . . . . . . . . 6 3 Incomplete information games 7 3.1 Description . . . . . . . . . . . . . . . . . 7 3.2 Results . . . . . . . . . . . . . . . . . . . . 8 4 Recursive structure and discrete dynamics 9 4.1 Representation of game with incomplete information as a stochastic game . . . . . 9 4.2 General repeated game . . . . . . . . . . . 10 4.3 Recursive formula . . . . . . . . . . . . . . 12 4.4 Examples . . . . . . . . . . . . . . . . . . 13 4.4.1 Stochastic games . . . . . . . . . . 13 4.4.2 incomplete information games . . . 13 5 Operator approach 14 6 Uniform approach 15 7 Open problems 17 8 References 18

2

slide-3
SLIDE 3

1 Introduction

1.1 Zero-sum games

1/4 3/4 1/2 2 1/2 −1 1 v = 1/2 Minmax theorem (von Neumann): Let A be an I×J matrix. ∃v∈IR, x∈∆(I), y∈∆(J) : xAy′ ≥ v, ∀y′ x′Ay ≤ v, ∀x′

1.2 Repetition, information and interaction

Repetition allows for:

  • Coordination
  • Threats

as a fonction of the information along the play. In the zero-sum case, the impact is only through the evolution of a jointly controlled state variable

1.3 Evaluation: asymptotic approach, uniform approach

sequence of stage payoff gn, n = 1, . . . ,

  • asymptotic approach:

for each averaging rule θ, value vθ. limiting behavior of the family vθ

3

slide-4
SLIDE 4
  • uniform approach:

properties independent of the (long) duration of the interaction

4

slide-5
SLIDE 5

2 Stochastic games

2.1 Description

Finite two-person zero-sum stochastic game:

  • state space Ω
  • action spaces I and J
  • payoff function g from Ω×I×J to IR
  • initial state, ω1, known to both players
  • at each stage t + 1, a transition Q(·|ωt, it, jt)∈∆(Ω)

determines the law of the new state ωt+1, announced to each player. X = ∆(I), Y = ∆(J) g and Q are extended by bilinearity to X×Y . α β a 1∗ 0∗ b 1

5

slide-6
SLIDE 6

2.2 Results

Shapley’s Theorem (1953) The value vλ of the λ discounted game is the only fixed point of the operator f→Φ(λ, f) from IRΩ to itself Φ(λ, f)(ω) = valX×Y {λg(ω, x, y)+(1−λ)

  • Ω f(ω′)Q(dω′|ω, x, y)}

where valX×Y stands for the value operator: valX×Y = max

X

min

Y

= min

Y

max

X

. Bewley and Kohlberg (1976a, 1976b) Algebraic approach: vλ has an expansion in Puiseux se- rie, hence limλ→0 vλ exists and limn→∞ vn = limλ→0 vλ. Mertens and Neyman (1981) General stochastic game: vλ BV implies lim vn = lim vλ (and the existence of v∞ under standard signalling). Lehrer and Sorin (1992) Markov Decision Process: uniform convergence of vλ is equivalent to uniform convergence of vn and the limits are the same. Example with Ω countable where both limits exist and differ.

6

slide-7
SLIDE 7

3 Incomplete information games

3.1 Description

Two-person zero-sum repeated games with incomplete information, Aumann and Maschler (1995). Simple case: independent information and standard sig- nalling.

  • parameter space: K×L
  • endowed with a product probability π = p⊗q ∈∆(K)×∆(L)

according to which (k, ℓ) is chosen.

  • k is told to Player 1 and ℓ to Player 2, hence the players

have partial private information on the parameter (k,ℓ) which is fixed for the duration of the play.

  • after each stage t the players are told the previous moves

(it, jt). a one-stage strategy of Player 1 is an element x in X = ∆(I)K (resp. y in Y = ∆(J)L for Player 2).

7

slide-8
SLIDE 8

3.2 Results

Aumann and Maschler (1966-68) Lack of information on one side: lim vn = lim vλ = v(= v∞) Mertens and Zamir (1971-72) Lack of information on both sides: lim vn = lim vλ = v characterization of v: existence and uniqueness of the solution of the functional equation v = Cavp min(u, v) v = Vexq max(u, v) where u is the value of the non revealing game - none of the players transmits (uses) his own information and Cav (resp. Vex) is the concavification (resp. convexifica- tion) operator: Given f from a convex set C to IR, CavCf is the smallest concave function greater than f on C.

8

slide-9
SLIDE 9

4 Recursive structure and discrete dynamics

4.1 Representation of game with incomplete information as a stochastic game

  • state space

χ = ∆(K)×∆(L) (beliefs of the players on the parameter along the play) Recall that a one-stage strategy of Player 1 is an element x in X = ∆(I)K (resp. y in Y = ∆(J)L for Player 2)

  • transition

Π : χ×X×Y→∆(χ)

  • Π((p(i), q(j))|(p, q), x, y) = x(i)y(j),
  • p(i) is the conditional probability on K given the

move i

  • x(i) the probability of this move (similarly y(j) for

Player 2) Explicitly: x(i) =

  • kpkxk

i and pk(i) = pkxk i

x(i) .

9

slide-10
SLIDE 10

4.2 General repeated game

  • parameter space M
  • action spaces I and J for Player 1 and 2 respectively
  • payoff function g from I×J×M to IR
  • signal sets A and B

(Assume all sets finite, avoiding measurability issues)

  • initial position: parameter m1, signal a1 (resp. b1) for

Player 1 (resp. Player 2) according to π probability on M×A×B

  • transition Q from M×I×J to probabilities on M×A×B.

At stage t, given the state mt and the moves (it, jt) (mt+1, at+1, bt+1)∼Q(mt, it, jt)

  • play of the game: m1, a1, b1, i1, j1, m2, a2, b2, i2, j2, . . .
  • information of Player 1 before his play at stage t: pri-

vate history of the form (a1, i1, a2, i2, . . ., at), (similarly for Player 2)

  • sequence of payoffs is g1, g2, . . ., gt, . . . with gt = g(it, jt, mt).
  • strategy for Player 1: σ, map from private histories to

∆(I): probabilities on the set I of actions τ defined similarly for Player 2.

10

slide-11
SLIDE 11

A couple (σ, τ) induces, together with the components

  • f the game, π and Q, a distribution on plays, Pσ,τ, hence
  • n the sequence of payoffs.

1) the finite n-stage game Γn with payoff given by the average of the first n rewards: γn(σ, τ) = Eσ,τ(1 n

n t=1gt)

2) the λ-discounted game Γλ with payoff equal to the discounted sum of the rewards: γλ(σ, τ) = Eσ,τ(

∞ t=1λ(1 − λ)t−1gt)

The values of these games are denoted by vn and vλ re-

  • spectively. The analysis of their asymptotic behavior, as

n goes to ∞ or λ goes to 0 is the study of the asymp- totic game.

11

slide-12
SLIDE 12

4.3 Recursive formula

The recursive structure relies on the construction of the universal belief space. Mertens and Zamir (1985) The infinite hierarchy of beliefs on M is canonically rep- resented by Ξ = M×Θ1×Θ2, where Θi, homeomorphic to ∆(M×Θ−i), is the type set of Player i. An information scheme is a probability on M×A×B (parameter × signals). It induces a consistent distri- bution Q on Ξ: for any Borel subset B of Ξ Q(B) =

  • Ξ θi(ζ)(B)Q(dζ)

where θi is the canonical projection from Ξ to Θi.

  • the strategies of the players and the signaling structure

in the game, before the moves at stage t, defines a proba- bility on t−histories, hence an information scheme, thus a consistent distribution on Ξ: the entrance law Pt

  • Pt and the (behavioral) strategies at stage t (maps from

types to mixed actions, αt : Θ1→∆(I), for Player 1, resp. βt for Player 2) determine the current payoff gt and the new entrance law Pt+1 = L(Pt, αt, βt).

  • the stationary aspect of the repeated game is expressed

by the fact that L does not depend on the stage t.

12

slide-13
SLIDE 13

The Shapley operator maps the set of real bounded functions defined on the space of consistent probabilities (in ∆(Ξ)) to itself: Ψ(f)(P) = valα×β{g(P, α, β) + f(L(P, α, β))} Mertens, Sorin and Zamir (1994), Sections III.1, III.2, IV.3. nvn = Ψn(0), vλ λ = Ψ((1 − λ)vλ λ ). Problems: asymptotic behavior of vλ as λ→0 or of vn as n→∞. Convergence ? convergence to the same w?

4.4 Examples

4.4.1 Stochastic games

Ψ operates on IRΩ: Ψ(f)(ω) = valX×Y {g(ω, x, y) +

  • Ω f(ω′)Q(dω′|ω, x, y)}

4.4.2 incomplete information games

Ψ is an operator on the set of real bounded saddle (con- cave/convex) functions on χ Ψ(f)(p, q) = valX×Y{g(p, q, x, y)+

  • χ f(p′, q′)Π(d(p′, q′)|(p, q), x, y)}

with g(p, q, x, y) =

  • k,ℓpkqℓg(k, ℓ, xk, yℓ).

13

slide-14
SLIDE 14

5 Operator approach Consider mappings Ψ satisfying:

  • domain: F cone of bounded real functions on Ω con-

taining the constants

  • properties:

(A) Monotonicity: f ≥ g ⇒ Ψ(f) ≥ Ψ(g). (1) (B) Reduction of constants: Ψ(f + a) ≤ Ψ(f) + a ∀a ≥ 0. (2) In particular Ψ is nonexpansive. Recall vn = 1 nΨ((n − 1)vn−1) = 1 nΨn(0) vλ = λΨ((1 λ − 1)vλ) Introduce, for 1 > ε > 0, a family of operators Φ(ε, ·): Φ(ε, x) = εΨ((1 ε − 1)x) (3) Then vn = Φ(1 n, vn−1), vλ = Φ(λ, vλ). (4) Conditions for the convergence of both families to the same function rely on the study of

′ϕ(f)(ω) = lim ε→0+

Φ(ε, f)(ω) − Φ(0, f)(ω) ε

14

slide-15
SLIDE 15

6 Uniform approach Recall γn(σ, τ) = Eσ,τ(1 n

n t=1gt)

v is the maxmin if the two following conditons are satisfied:

  • Player I can guarantee v: for any ε > 0, there exists

a strategy σ of Player 1 and an integer N such that for any n ≥ N and any strategy τ of Player 2: γn(σ, τ) ≥ v − ε (It follows from the uniformity in τ that if Player I can guarantee f both lim infn→∞ vn and lim infλ→0 vλ will be greater than f.)

  • Player II can defend v: for any ε > 0 and any strategy

σ of Player 1, there exist an integer N and a strategy τ

  • f Player 2 such that for all n≥N:

γn(σ, τ) ≤ v + ε. (Note that to satisfy this requirement is stronger than to contradict the previous condition; hence the existence of v is an issue.) A dual definition holds for the minmax v. Whenever v = v, the game has a uniform value, denoted by v∞. Remark that the existence of v∞ implies: v∞ = lim

n→∞ vn = lim λ→0 vλ. 15

slide-16
SLIDE 16

Mertens and Neyman, 1981 Finite stochastic games with standard signalling: v∞ ex- ists. Aumann and Maschler, 1967 Games with incomplete information on both sides: min max = VexCavu max min = CavVexu

  • games with no uniform value
  • influence of the signalling structure on the value for

stochastic games (Coulomb, 2001) payoffs = 1∗ 0∗ C 1 C signals = ? ? ? a b a Player 2 can play (0, ε, 1 − ε) i.i.d. (hence generating a distribution (1 − ε, ε) on (a, b)) until exhausting the probability of Top and then switch to (1−ε, ε, 0) without being detected.

16

slide-17
SLIDE 17

7 Open problems Asymptotic analysis a) Conjectures: lim vn = lim vλ

  • in the “finite” case
  • for finite continuous stochastic games (non algebraic)

b) explicit characterization of the limit (trough varia- tional inequalities) c) Extension to random duration Uniform approach d) Conjecture: min max and max min exist in the “fi- nite” case e) relation with stability/viability of an underlying dif- ferential inclusion f) description of the “natural” state space trough suffi- cient statistics Repeated games with automata

17

slide-18
SLIDE 18

8 References Aumann R.J. and M. Maschler (1995), Repeated Games with Incomplete Information, M.I.T. Press (with the col- laboration of R. Stearns). Bewley T. and E. Kohlberg (1976a), The asymptotic theory of stochastic games, Mathematics of Operations Research, 1, 197-208. Bewley T. and E. Kohlberg (1976b), The asymptotic solution of a recursion equation occurring in stochastic games, Mathematics of Operations Research, 1, 321-336. Coulomb, J.-M. (2001), Repeated games with absorb- ing states and signalling structure, Mathematics of Op- erations Research, 26, 286-303. Kohlberg E. (1974), Repeated games with absorbing states, Annals of Statistics, 2, 724-738. Laraki R. (2001), Variational inequalities, systems of functional equations and incomplete information repeated games, SIAM Journal of Control and Optimization, 40, 516-524. Lehrer E. and S. Sorin (1992), A uniform Tauberian theorem in dynamic programming, Mathematics of Op- erations Research, 17, 303-307. Mertens J.-F. and A. Neyman (1981), Stochastic games, International Journal of Game Theory, 10, 53-66. Mertens J.-F., S. Sorin and S. Zamir (1994), Repeated Games, CORE D.P. 9420-21-22. Mertens J.-F. and S. Zamir (1971), The value of two- person zero-sum repeated games with lack of information

  • n both sides, International Journal of Game Theory, 1,

39-64.

18

slide-19
SLIDE 19

Mertens J.-F. and S. Zamir (1985), Formulation of Bayesian analysis for games with incomplete informa- tion, International Journal of Game Theory, 14, 1-29. Neyman A. (1998), Nonexpansive mappings and sto- chastic games, in Stochastic Games and Applications, A. Neyman and S. Sorin (eds.), Kluwer A. P., to appear. Neyman A. and S. Sorin (2001), Zero sum two person repeated games with public uncertain duration process, Cahier du Laboratoire d’Econometrie, Ecole Polytech- nique, 2001-013 and Center for Rationality and Interac- tive Decision Theory Discussion Paper, 259. Rosenberg D. and S. Sorin (2001), An operator ap- proach to zero-sum repeated games, Israel Journal of Mathematics, 121, 221-246. Shapley L. S. (1953), Stochastic games, Proceedings

  • f the National Academy of Sciences of the U.S.A, 39,

1095-1100. Sorin S. (2000), New approaches and recent advances in two person zero-sum repeated games, in Proceedings

  • f the ISDG Conference, Adelaide 2000, A. Nowak ed.,

Birkh¨ auser, to appear. Sorin S. (2001), Asymptotic properties of monotonic non expansive mappings, Cahier du Laboratoire d’Econo- metrie, Ecole Polytechnique, 2001-012. Sorin S. (2002), A First Course on Zero-Sum Repeated Games, Springer. Sorin S. (2002), The operator approach to zero-sum stochastic games, in Stochastic Games and Applications,

  • A. Neyman and S. Sorin (eds.), Kluwer A. P., to appear.

19