interactions and dynamics some aspects of repeated zero
play

Interactions and dynamics: some aspects of repeated zero-sum games - PDF document

Interactions and dynamics: some aspects of repeated zero-sum games Sylvain Sorin Laboratoire dEconom etrie, Ecole Polytechnique, 1 rue Descartes, 75005 Paris and Equipe Combinatoire, UFR 921, Universit e P. et M. Curie - Paris 6,


  1. Interactions and dynamics: some aspects of repeated zero-sum games Sylvain Sorin Laboratoire d’Econom´ etrie, Ecole Polytechnique, 1 rue Descartes, 75005 Paris and Equipe Combinatoire, UFR 921, Universit´ e P. et M. Curie - Paris 6, 175 Rue du Chevaleret, 75013 Paris, France sorin@poly.polytechnique.fr Winter School on Complex Systems December 9 -13, 2002 Ecole Normale Sup´ erieure de Lyon 1

  2. Contents 1 Introduction 3 1.1 Zero-sum games . . . . . . . . . . . . . . . 3 1.2 Repetition, information and interaction . . 3 1.3 Evaluation: asymptotic approach, uniform approach . . . . . . . . . . . . . . . . . . . 3 2 Stochastic games 5 2.1 Description . . . . . . . . . . . . . . . . . 5 2.2 Results . . . . . . . . . . . . . . . . . . . . 6 3 Incomplete information games 7 3.1 Description . . . . . . . . . . . . . . . . . 7 3.2 Results . . . . . . . . . . . . . . . . . . . . 8 4 Recursive structure and discrete dynamics 9 4.1 Representation of game with incomplete information as a stochastic game . . . . . 9 4.2 General repeated game . . . . . . . . . . . 10 4.3 Recursive formula . . . . . . . . . . . . . . 12 4.4 Examples . . . . . . . . . . . . . . . . . . 13 4.4.1 Stochastic games . . . . . . . . . . 13 4.4.2 incomplete information games . . . 13 5 Operator approach 14 6 Uniform approach 15 7 Open problems 17 8 References 18 2

  3. 1 Introduction 1.1 Zero-sum games 1 / 4 3 / 4 1 / 2 2 0 1 / 2 − 1 1 v = 1 / 2 Minmax theorem (von Neumann): Let A be an I × J matrix. ∃ v ∈ IR , x ∈ ∆( I ) , y ∈ ∆( J ) : xAy ′ ≥ v, ∀ y ′ x ′ Ay ≤ v, ∀ x ′ 1.2 Repetition, information and interaction Repetition allows for: - Coordination - Threats as a fonction of the information along the play. In the zero-sum case, the impact is only through the evolution of a jointly controlled state variable 1.3 Evaluation: asymptotic approach, uniform approach sequence of stage payoff g n , n = 1 , . . . , - asymptotic approach: for each averaging rule θ , value v θ . limiting behavior of the family v θ 3

  4. - uniform approach: properties independent of the (long) duration of the interaction 4

  5. 2 Stochastic games 2.1 Description Finite two-person zero-sum stochastic game: - state space Ω - action spaces I and J - payoff function g from Ω × I × J to IR - initial state, ω 1 , known to both players - at each stage t + 1, a transition Q ( ·| ω t , i t , j t ) ∈ ∆(Ω) determines the law of the new state ω t +1 , announced to each player. X = ∆( I ), Y = ∆( J ) g and Q are extended by bilinearity to X × Y . α β 1 ∗ 0 ∗ a b 0 1 5

  6. 2.2 Results Shapley’s Theorem (1953) The value v λ of the λ discounted game is the only fixed point of the operator f �→ Φ ( λ, f ) from IR Ω to itself � Ω f ( ω ′ ) Q ( dω ′ | ω, x, y ) } Φ ( λ, f )( ω ) = val X × Y { λg ( ω, x, y )+(1 − λ ) where val X × Y stands for the value operator: val X × Y = max min = min max . X Y Y X Bewley and Kohlberg (1976a, 1976b) Algebraic approach: v λ has an expansion in Puiseux se- rie, hence lim λ → 0 v λ exists and lim n →∞ v n = lim λ → 0 v λ . Mertens and Neyman (1981) General stochastic game: v λ BV implies lim v n = lim v λ (and the existence of v ∞ under standard signalling). Lehrer and Sorin (1992) Markov Decision Process: uniform convergence of v λ is equivalent to uniform convergence of v n and the limits are the same. Example with Ω countable where both limits exist and differ. 6

  7. 3 Incomplete information games 3.1 Description Two-person zero-sum repeated games with incomplete information, Aumann and Maschler (1995). Simple case: independent information and standard sig- nalling. - parameter space: K × L - endowed with a product probability π = p ⊗ q ∈ ∆( K ) × ∆( L ) according to which ( k, ℓ ) is chosen. - k is told to Player 1 and ℓ to Player 2, hence the players have partial private information on the parameter ( k , ℓ ) which is fixed for the duration of the play. - after each stage t the players are told the previous moves ( i t , j t ). a one-stage strategy of Player 1 is an element x in X = ∆( I ) K (resp. y in Y = ∆( J ) L for Player 2). 7

  8. 3.2 Results Aumann and Maschler (1966-68) Lack of information on one side: lim v n = lim v λ = v (= v ∞ ) Mertens and Zamir (1971-72) Lack of information on both sides: lim v n = lim v λ = v characterization of v : existence and uniqueness of the solution of the functional equation v = Cav p min( u, v ) v = Vex q max( u, v ) where u is the value of the non revealing game - none of the players transmits (uses) his own information and Cav (resp. Vex ) is the concavification (resp. convexifica- tion) operator: Given f from a convex set C to IR , Cav C f is the smallest concave function greater than f on C . 8

  9. 4 Recursive structure and discrete dynamics 4.1 Representation of game with incomplete information as a stochastic game - state space χ = ∆( K ) × ∆( L ) (beliefs of the players on the parameter along the play) Recall that a one-stage strategy of Player 1 is an element x in X = ∆( I ) K (resp. y in Y = ∆( J ) L for Player 2) - transition Π : χ × X × Y → ∆( χ ) • Π(( p ( i ) , q ( j )) | ( p, q ) , x, y ) = x ( i ) y ( j ), • p ( i ) is the conditional probability on K given the move i • x ( i ) the probability of this move (similarly y ( j ) for Player 2) i and p k ( i ) = p k x k i k p k x k Explicitly: x ( i ) = x ( i ) . � 9

  10. 4.2 General repeated game - parameter space M - action spaces I and J for Player 1 and 2 respectively - payoff function g from I × J × M to IR - signal sets A and B (Assume all sets finite, avoiding measurability issues) - initial position: parameter m 1 , signal a 1 (resp. b 1 ) for Player 1 (resp. Player 2) according to π probability on M × A × B - transition Q from M × I × J to probabilities on M × A × B . At stage t , given the state m t and the moves ( i t , j t ) ( m t +1 , a t +1 , b t +1 ) ∼ Q ( m t , i t , j t ) - play of the game: m 1 , a 1 , b 1 , i 1 , j 1 , m 2 , a 2 , b 2 , i 2 , j 2 , . . . - information of Player 1 before his play at stage t : pri- vate history of the form ( a 1 , i 1 , a 2 , i 2 , . . ., a t ), (similarly for Player 2) - sequence of payoffs is g 1 , g 2 , . . ., g t , . . . with g t = g ( i t , j t , m t ). - strategy for Player 1: σ , map from private histories to ∆( I ): probabilities on the set I of actions τ defined similarly for Player 2. 10

  11. A couple ( σ, τ ) induces, together with the components of the game, π and Q , a distribution on plays, P σ,τ , hence on the sequence of payoffs. 1) the finite n -stage game Γ n with payoff given by the average of the first n rewards: γ n ( σ, τ ) = E σ,τ (1 � n t =1 g t ) n 2) the λ -discounted game Γ λ with payoff equal to the discounted sum of the rewards: � ∞ t =1 λ (1 − λ ) t − 1 g t ) γ λ ( σ, τ ) = E σ,τ ( The values of these games are denoted by v n and v λ re- spectively. The analysis of their asymptotic behavior, as n goes to ∞ or λ goes to 0 is the study of the asymp- totic game . 11

  12. 4.3 Recursive formula The recursive structure relies on the construction of the universal belief space . Mertens and Zamir (1985) The infinite hierarchy of beliefs on M is canonically rep- resented by Ξ = M × Θ 1 × Θ 2 , where Θ i , homeomorphic to ∆( M × Θ − i ) , is the type set of Player i . An information scheme is a probability on M × A × B (parameter × signals). It induces a consistent distri- bution Q on Ξ: for any Borel subset B of Ξ � Ξ θ i ( ζ )( B ) Q ( dζ ) Q ( B ) = where θ i is the canonical projection from Ξ to Θ i . - the strategies of the players and the signaling structure in the game, before the moves at stage t , defines a proba- bility on t − histories, hence an information scheme, thus a consistent distribution on Ξ: the entrance law P t - P t and the (behavioral) strategies at stage t (maps from types to mixed actions, α t : Θ 1 → ∆( I ), for Player 1, resp. β t for Player 2) determine the current payoff g t and the new entrance law P t +1 = L ( P t , α t , β t ). - the stationary aspect of the repeated game is expressed by the fact that L does not depend on the stage t . 12

  13. The Shapley operator maps the set of real bounded functions defined on the space of consistent probabilities (in ∆(Ξ)) to itself: Ψ ( f )( P ) = val α × β { g ( P , α, β ) + f ( L ( P , α, β )) } Mertens, Sorin and Zamir (1994), Sections III.1, III.2, IV.3. v λ λ = Ψ ((1 − λ ) v λ nv n = Ψ n (0) , λ ) . Problems: asymptotic behavior of v λ as λ → 0 or of v n as n →∞ . Convergence ? convergence to the same w ? 4.4 Examples 4.4.1 Stochastic games Ψ operates on IR Ω : � Ω f ( ω ′ ) Q ( dω ′ | ω, x, y ) } Ψ ( f )( ω ) = val X × Y { g ( ω, x, y ) + 4.4.2 incomplete information games Ψ is an operator on the set of real bounded saddle (con- cave/convex) functions on χ � χ f ( p ′ , q ′ )Π( d ( p ′ , q ′ ) | ( p, q ) , x, y ) } Ψ ( f )( p, q ) = val X × Y { g ( p, q, x, y )+ k,ℓ p k q ℓ g ( k, ℓ, x k , y ℓ ). with g ( p, q, x, y ) = � 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend