Interactions and dynamics: some aspects of repeated zero-sum games - PDF document

Interactions and dynamics: some aspects of repeated zero-sum games Sylvain Sorin Laboratoire d’Econom´ etrie, Ecole Polytechnique, 1 rue Descartes, 75005 Paris and Equipe Combinatoire, UFR 921, Universit´ e P. et M. Curie - Paris 6, 175 Rue du Chevaleret, 75013 Paris, France sorin@poly.polytechnique.fr Winter School on Complex Systems December 9 -13, 2002 Ecole Normale Sup´ erieure de Lyon 1

Contents 1 Introduction 3 1.1 Zero-sum games . . . . . . . . . . . . . . . 3 1.2 Repetition, information and interaction . . 3 1.3 Evaluation: asymptotic approach, uniform approach . . . . . . . . . . . . . . . . . . . 3 2 Stochastic games 5 2.1 Description . . . . . . . . . . . . . . . . . 5 2.2 Results . . . . . . . . . . . . . . . . . . . . 6 3 Incomplete information games 7 3.1 Description . . . . . . . . . . . . . . . . . 7 3.2 Results . . . . . . . . . . . . . . . . . . . . 8 4 Recursive structure and discrete dynamics 9 4.1 Representation of game with incomplete information as a stochastic game . . . . . 9 4.2 General repeated game . . . . . . . . . . . 10 4.3 Recursive formula . . . . . . . . . . . . . . 12 4.4 Examples . . . . . . . . . . . . . . . . . . 13 4.4.1 Stochastic games . . . . . . . . . . 13 4.4.2 incomplete information games . . . 13 5 Operator approach 14 6 Uniform approach 15 7 Open problems 17 8 References 18 2

1 Introduction 1.1 Zero-sum games 1 / 4 3 / 4 1 / 2 2 0 1 / 2 − 1 1 v = 1 / 2 Minmax theorem (von Neumann): Let A be an I × J matrix. ∃ v ∈ IR , x ∈ ∆( I ) , y ∈ ∆( J ) : xAy ′ ≥ v, ∀ y ′ x ′ Ay ≤ v, ∀ x ′ 1.2 Repetition, information and interaction Repetition allows for: - Coordination - Threats as a fonction of the information along the play. In the zero-sum case, the impact is only through the evolution of a jointly controlled state variable 1.3 Evaluation: asymptotic approach, uniform approach sequence of stage payoff g n , n = 1 , . . . , - asymptotic approach: for each averaging rule θ , value v θ . limiting behavior of the family v θ 3

- uniform approach: properties independent of the (long) duration of the interaction 4

2 Stochastic games 2.1 Description Finite two-person zero-sum stochastic game: - state space Ω - action spaces I and J - payoff function g from Ω × I × J to IR - initial state, ω 1 , known to both players - at each stage t + 1, a transition Q ( ·| ω t , i t , j t ) ∈ ∆(Ω) determines the law of the new state ω t +1 , announced to each player. X = ∆( I ), Y = ∆( J ) g and Q are extended by bilinearity to X × Y . α β 1 ∗ 0 ∗ a b 0 1 5

2.2 Results Shapley’s Theorem (1953) The value v λ of the λ discounted game is the only fixed point of the operator f �→ Φ ( λ, f ) from IR Ω to itself � Ω f ( ω ′ ) Q ( dω ′ | ω, x, y ) } Φ ( λ, f )( ω ) = val X × Y { λg ( ω, x, y )+(1 − λ ) where val X × Y stands for the value operator: val X × Y = max min = min max . X Y Y X Bewley and Kohlberg (1976a, 1976b) Algebraic approach: v λ has an expansion in Puiseux se- rie, hence lim λ → 0 v λ exists and lim n →∞ v n = lim λ → 0 v λ . Mertens and Neyman (1981) General stochastic game: v λ BV implies lim v n = lim v λ (and the existence of v ∞ under standard signalling). Lehrer and Sorin (1992) Markov Decision Process: uniform convergence of v λ is equivalent to uniform convergence of v n and the limits are the same. Example with Ω countable where both limits exist and differ. 6

3 Incomplete information games 3.1 Description Two-person zero-sum repeated games with incomplete information, Aumann and Maschler (1995). Simple case: independent information and standard signalling. - parameter space: K × L - endowed with a product probability π = p ⊗ q ∈ ∆( K ) × ∆( L ) according to which ( k, ℓ ) is chosen. - k is told to Player 1 and ℓ to Player 2, hence the players have partial private information on the parameter ( k , ℓ ) which is fixed for the duration of the play. - after each stage t the players are told the previous moves ( i t , j t ). a one-stage strategy of Player 1 is an element x in X = ∆( I ) K (resp. y in Y = ∆( J ) L for Player 2). 7

3.2 Results Aumann and Maschler (1966-68) Lack of information on one side: lim v n = lim v λ = v (= v ∞ ) Mertens and Zamir (1971-72) Lack of information on both sides: lim v n = lim v λ = v characterization of v : existence and uniqueness of the solution of the functional equation v = Cav p min( u, v ) v = Vex q max( u, v ) where u is the value of the non revealing game - none of the players transmits (uses) his own information and Cav (resp. Vex ) is the concavification (resp. convexifica- tion) operator: Given f from a convex set C to IR , Cav C f is the smallest concave function greater than f on C . 8

4 Recursive structure and discrete dynamics 4.1 Representation of game with incomplete information as a stochastic game - state space χ = ∆( K ) × ∆( L ) (beliefs of the players on the parameter along the play) Recall that a one-stage strategy of Player 1 is an element x in X = ∆( I ) K (resp. y in Y = ∆( J ) L for Player 2) - transition Π : χ × X × Y → ∆( χ ) • Π(( p ( i ) , q ( j )) | ( p, q ) , x, y ) = x ( i ) y ( j ), • p ( i ) is the conditional probability on K given the move i • x ( i ) the probability of this move (similarly y ( j ) for Player 2) i and p k ( i ) = p k x k i k p k x k Explicitly: x ( i ) = x ( i ) . � 9

4.2 General repeated game - parameter space M - action spaces I and J for Player 1 and 2 respectively - payoff function g from I × J × M to IR - signal sets A and B (Assume all sets finite, avoiding measurability issues) - initial position: parameter m 1 , signal a 1 (resp. b 1 ) for Player 1 (resp. Player 2) according to π probability on M × A × B - transition Q from M × I × J to probabilities on M × A × B . At stage t , given the state m t and the moves ( i t , j t ) ( m t +1 , a t +1 , b t +1 ) ∼ Q ( m t , i t , j t ) - play of the game: m 1 , a 1 , b 1 , i 1 , j 1 , m 2 , a 2 , b 2 , i 2 , j 2 , . . . - information of Player 1 before his play at stage t : private history of the form ( a 1 , i 1 , a 2 , i 2 , . . ., a t ), (similarly for Player 2) - sequence of payoffs is g 1 , g 2 , . . ., g t , . . . with g t = g ( i t , j t , m t ). - strategy for Player 1: σ , map from private histories to ∆( I ): probabilities on the set I of actions τ defined similarly for Player 2. 10

A couple ( σ, τ ) induces, together with the components of the game, π and Q , a distribution on plays, P σ,τ , hence on the sequence of payoffs. 1) the finite n -stage game Γ n with payoff given by the average of the first n rewards: γ n ( σ, τ ) = E σ,τ (1 � n t =1 g t ) n 2) the λ -discounted game Γ λ with payoff equal to the discounted sum of the rewards: � ∞ t =1 λ (1 − λ ) t − 1 g t ) γ λ ( σ, τ ) = E σ,τ ( The values of these games are denoted by v n and v λ respectively. The analysis of their asymptotic behavior, as n goes to ∞ or λ goes to 0 is the study of the asymptotic game . 11

4.3 Recursive formula The recursive structure relies on the construction of the universal belief space . Mertens and Zamir (1985) The infinite hierarchy of beliefs on M is canonically rep- resented by Ξ = M × Θ 1 × Θ 2 , where Θ i , homeomorphic to ∆( M × Θ − i ) , is the type set of Player i . An information scheme is a probability on M × A × B (parameter × signals). It induces a consistent distribution Q on Ξ: for any Borel subset B of Ξ � Ξ θ i ( ζ )( B ) Q ( dζ ) Q ( B ) = where θ i is the canonical projection from Ξ to Θ i . - the strategies of the players and the signaling structure in the game, before the moves at stage t , defines a probability on t − histories, hence an information scheme, thus a consistent distribution on Ξ: the entrance law P t - P t and the (behavioral) strategies at stage t (maps from types to mixed actions, α t : Θ 1 → ∆( I ), for Player 1, resp. β t for Player 2) determine the current payoff g t and the new entrance law P t +1 = L ( P t , α t , β t ). - the stationary aspect of the repeated game is expressed by the fact that L does not depend on the stage t . 12

The Shapley operator maps the set of real bounded functions defined on the space of consistent probabilities (in ∆(Ξ)) to itself: Ψ ( f )( P ) = val α × β { g ( P , α, β ) + f ( L ( P , α, β )) } Mertens, Sorin and Zamir (1994), Sections III.1, III.2, IV.3. v λ λ = Ψ ((1 − λ ) v λ nv n = Ψ n (0) , λ ) . Problems: asymptotic behavior of v λ as λ → 0 or of v n as n →∞ . Convergence ? convergence to the same w ? 4.4 Examples 4.4.1 Stochastic games Ψ operates on IR Ω : � Ω f ( ω ′ ) Q ( dω ′ | ω, x, y ) } Ψ ( f )( ω ) = val X × Y { g ( ω, x, y ) + 4.4.2 incomplete information games Ψ is an operator on the set of real bounded saddle (concave/convex) functions on χ � χ f ( p ′ , q ′ )Π( d ( p ′ , q ′ ) | ( p, q ) , x, y ) } Ψ ( f )( p, q ) = val X × Y { g ( p, q, x, y )+ k,ℓ p k q ℓ g ( k, ℓ, x k , y ℓ ). with g ( p, q, x, y ) = � 13

Interactions and dynamics: some aspects of repeated zero-sum games - PDF document

Interactions and dynamics: some aspects of repeated zero-sum games Sylvain Sorin Laboratoire dEconom etrie, Ecole Polytechnique, 1 rue Descartes, 75005 Paris and Equipe Combinatoire, UFR 921, Universit e P. et M. Curie - Paris 6,

Analysis of variance and regression 2009-3-11 Lene Theil Skovgaard Repeated measurements May

Zero Waste at The Nat Zero Waste Zero Waste Zero Waste is a philosophy that encourages the

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma

Game Theory Repeated Games Levent Ko ckesen Ko c University Levent Ko ckesen (Ko c

Repeated games Felix Munoz-Garcia Strategy and Game Theory - Washington State University Repeated

Open Dynamics under Rapid Repeated Interaction Daniel Grimmer David Layden Eduardo

Zero-knowledge Arguments Proving circuit satisfaibility in zero-knowledge Zero-knowledge In

Consortium Zero new HIV infections Zero HIV deaths Zero stigma and discrimination Agenda 1.

DALLAS ZERO WASTE Recycling 101 ZERO WASTE PLAN What is Zero Waste? The planet has limited

VISION ZERO SF: ELIMINATING TRAFFIC DEATHS BY 2024 FEBRUARY 6, 2017 VISION ZERO VISION ZERO SF

Presentation of Platform Zero Incidents Platform Zero Incidents Platform Zero Incidents MENTAL

Vision Zero Insight A new approach to Roads Policing VISION ZERO 2 The Vision Zero Action Plan

Dynamic Games in Environmental Economics PhD minicourse Part I: Repeated Games and Self-Enforcing

Environmental Economics 4910 Brd Harstad UiO February 2019 Brd Harstad (UiO) Repeated

Repetition vs. Pattern vs. Rhythm Repetition One object or shape that is repeated Pattern A

Game Theory Basics Game theory is designed to model How rational (payoff-maximizing)

ECO 199 B GAMES OF STRATEGY Spring Term 2004 B March 2 MIXED STRATEGIES B NON-ZERO-SUM GAMES

Comparison of Information Structures for Zero-Sum Games in Standard Borel Spaces Ian

Announcements Minbiaos office hour will be changed to Thursday 1-2 pm, starting from next

Non-Zero-Sum Stochastic Differential Games of Controls and Stoppings Qinghua Li October 1, 2009

Finding Optimal Mixed Finding Optimal Mixed Strategies to Commit to in g Security Games

Variance Reduction for Matrix Games Yair Carmon Yujia Jin Aaron Sidford Kevin Tian

Introduction to Game Theory Lirong Xia Fall, 2016 Homework 1 2 Announcements We will use