Generalised weakened fictitious play and random belief learning - PowerPoint PPT Presentation

Generalised weakened fictitious play and random belief learning David S. Leslie 12 April 2010 Collaborators: Sean Collins, Claudio Mezzetti, Archie Chapman

Overview • Learning in games • Stochastic approximation • Generalised weakened fictitious play – Random belief learning – Oblivious learners

Normal form games • Players i = 1 , . . . , N • Action sets A i • Reward functions r i : A 1 × · · · × A N → R

Mixed strategies • Mixed strategies π i ∈ ∆ i • Joint mixed strategy π = ( π 1 , . . . , π N ) • Reward function extended so that r i ( π ) = E π [ r i ( a )]

Best responses Assume other players use mixed strategy π − i . Player i should choose a mixed strategy in the best response set b i ( π − i ) = argmax π i ∈ ∆ i r i (˜ π i , π − i ) ˜

Best responses Assume other players use mixed strategy π − i . Player i should choose a mixed strategy in the best response set b i ( π − i ) = argmax π i ∈ ∆ i r i (˜ π i , π − i ) ˜ A Nash equilibrium is a fixed point of the best response map: π i ∈ b i ( π − i ) for all i

A problem with Nash Consider the game � (2 , 0) (0 , 1) � (0 , 2) (1 , 0) with unique Nash equilibrium π 1 = (2 / 3 , 1 / 3) , π 2 = (1 / 3 , 2 / 3)

A problem with Nash Consider the game � (2 , 0) (0 , 1) � (0 , 2) (1 , 0) with unique Nash equilibrium π 1 = (2 / 3 , 1 / 3) , π 2 = (1 / 3 , 2 / 3) • r i ( a i , π − i ) = 2 / 3 for each i, a i • How does Player 1 know to use π 1 = (2 / 3 , 1 / 3) ? • Player 2 to use π 2 = (1 / 3 , 2 / 3) ?

Learning in games • Attempts to justify equilibrium play as the end point of a learning process • Generally assumes pretty stupid players! • Related to evolutionary game theory

Multi-armed bandits At time n , choose action a n , and receive reward R n

Multi-armed bandits Estimate after time n of the expected reward for action a ∈ A is: � R m m ≤ n : a m = a Q n ( a ) = κ n ( a ) where κ n ( a ) = � n m =1 I { a m = a }

Multi-armed bandits If a n � = a , κ n ( a ) = κ n − 1 ( a ) and: �� n − 1 � m =1 I { a m = a } R m + 0 Q n ( a ) = = Q n − 1 ( a ) κ n − 1 ( a )

Multi-armed bandits if a n = a , �� n − 1 � m =1 I { a m = a } R m + R n Q n ( a ) = κ n ( a ) � � 1 1 = 1 − Q n − 1 ( a ) + κ n ( a ) R n κ n ( a )

Multi-armed bandits Update estimates using 1  Q n − 1 ( a ) + κ n ( a ) { R n − Q n − 1 ( a ) } if a n = a   Q n ( a ) =  Q n − 1 ( a ) if a n � = a  At time n + 1 use Q n to choose an action a n +1

Fictitious play At iteration n + 1 , player i : • forms beliefs σ − i ∈ ∆ − i about the other players’ strategies n • chooses an action in b i ( σ − i n )

Belief formation The beliefs about player j are simply the MLE: n ( a j ) = κ j n ( a j ) where κ j m =1 I { a j σ j n ( a j ) = � n m = a j } n

Belief formation The beliefs about player j are simply the MLE: n ( a j ) = κ j n ( a j ) where κ j m =1 I { a j σ j n ( a j ) = � n m = a j } n Recursive update: n +1 ( a j ) = κ j = κ j n ( a j )+ I { a j n +1 ( a j ) n +1 = a j } σ j n +1 n +1

Belief formation The beliefs about player j are simply the MLE: n ( a j ) = κ j n ( a j ) where κ j m =1 I { a j σ j n ( a j ) = � n m = a j } n Recursive update: n +1 ( a j ) = κ j = κ j n ( a j )+ I { a j + I { a j n +1 ( a j ) n +1 = a j } κ j n +1 = a j } n ( a j ) σ j n = n +1 n +1 n +1 n n +1

Belief formation The beliefs about player j are simply the MLE: n ( a j ) = κ j n ( a j ) where κ j m =1 I { a j σ j n ( a j ) = � n m = a j } n Recursive update: σ j n +1 I { a j n +1 ( a j ) = σ j n ( a j ) + n +1 = a j } � 1 � 1 1 − n +1

Belief formation The beliefs about player j are simply the MLE: n ( a j ) = κ j n ( a j ) where κ j m =1 I { a j σ j n ( a j ) = � n m = a j } n Recursive update: σ j σ j � 1 � 1 = 1 − + n +1 e a j n n +1 n +1 n +1

Belief formation The beliefs about player j are simply the MLE: n ( a j ) = κ j n ( a j ) where κ j m =1 I { a j σ j n ( a j ) = � n m = a j } n Recursive update: σ j σ j � 1 � 1 = 1 − + n +1 e a j n n +1 n +1 n +1 In terms of best responses: σ j � 1 � σ j n +1 b j ( σ − j 1 ∈ 1 − + n ) n n +1 n +1

Belief formation The beliefs about player j are simply the MLE: n ( a j ) = κ j n ( a j ) where κ j m =1 I { a j σ j n ( a j ) = � n m = a j } n Recursive update: σ j σ j � 1 � 1 = 1 − + n +1 e a j n n +1 n +1 n +1 In terms of best responses: � 1 � 1 σ n +1 ∈ 1 − σ n + n +1 b ( σ n ) n +1

Stochastic approximation

Stochastic approximation θ n +1 ∈ θ n + α n +1 { F ( θ n ) + M n +1 }

Stochastic approximation θ n +1 ∈ θ n + α n +1 { F ( θ n ) + M n +1 } • F : Θ → Θ is a (bounded u.s.c.) set-valued map • α n → 0 , � n α n = ∞ • For any T > 0 , � � k − 1 � � � � � lim sup α i +1 M i +1 � = 0 � � n →∞ � � k>n : � k − 1 i = n α i +1 ≤ T i = n � n ( α n ) 2 < ∞ , E [ M n +1 | θ n ] → 0 , and The last is implied by: � Var [ M n +1 | θ n ] < C almost surely.

Stochastic approximation θ n +1 ∈ θ n + α n +1 { F ( θ n ) + M n +1 } θ n +1 − θ n ∈ F ( θ n ) + M n +1 α n ↓ d d t θ ∈ F ( θ ) , a differential inclusion (Bena ¨ ım, Hofbauer and Sorin, 2005)

Stochastic approximation θ n +1 ∈ θ n + α n +1 { F ( θ n ) + M n +1 } In fictitious play: 1 σ n +1 ∈ σ n + n +1 { b ( σ n ) − σ n } ↓ d d t σ ∈ b ( σ ) − σ, the best response differential inclusion. Hence σ n converges to the set of Nash equilibria in zero-sum games, potential games, and generic 2 × m games.

Generalised weakened fictitious play

Weakened fictitious play • Van der Genugten (2000) showed that the convergence rate of fictitious play can be improved if players use ǫ n -best responses. (For 2-player zero-sum games, and a very specific choice of ǫ n ) • π ∈ b ǫ n ( σ n ) ⇒ π ∈ b ( σ n ) + M n +1 where M n → 0 as ǫ n → 0 (by continuity properties of b and boundedness of r ) • For general games and general ǫ n → 0 this fits into the stochastic approximation framework

Generalised weakened fictitious play Theorem: Any process such that σ n +1 ∈ σ n + α n +1 { b ǫ n ( σ n ) − σ n + M n +1 } where • ǫ n → 0 as n → ∞ • α n → 0 as n → ∞ � � k − 1 � � � � � • lim sup α i +1 M i +1 � = 0 � � n →∞ � � k>n : � k − 1 i = n i = n α i +1 ≤ T � converges to the set of Nash equilibria for zero-sum games, potential games and generic 2 × m games.

Recency • For classical fictitious play α n = 1 n , ǫ n ≡ 0 and M n ≡ 0 • For any α n → 0 the conditions are met (since M n ≡ 0 ) 1 1 • How about α n = √ n , or even α n = log n ?

Recency Belief that Player 1 plays Heads over 200 plays of the two-player matching pennies game under classical fictitious play (top), under a modified ficti- 1 tious play with α n = √ n (middle), and with α n = 1 log n (bottom)

Stochastic fictitious play In fictitious play, players always choose pure actions ⇒ strategies never converge to mixed strategies (beliefs do, but played strategies do not)

Stochastic fictitious play Instead consider smooth best responses: β i τ ( σ − i ) = argmax � r i ( π i , σ − i ) + τv ( π i ) � π i ∈ ∆ i exp { r i ( a i ,σ − i ) /τ } For example β i τ ( σ − i )( a i ) = a ∈ Ai exp { r i ( a,σ − i ) /τ } �

Stochastic fictitious play Instead consider smooth best responses: β i τ ( σ − i ) = argmax � r i ( π i , σ − i ) + τv ( π i ) � π i ∈ ∆ i exp { r i ( a i ,σ − i ) /τ } For example β i τ ( σ − i )( a i ) = a ∈ Ai exp { r i ( a,σ − i ) /τ } � Strategies evolve according to σ n +1 = σ n + 1 n +1 { β τ ( σ n ) + M n +1 − σ n } where E [ M n +1 | σ n ] = 0

Convergence 1 σ n +1 = σ n + n +1 { β τ ( σ n ) − σ n + M n +1 }

Convergence 1 σ n +1 = σ n + n +1 { β τ ( σ n ) − σ n + M n +1 } 1 ∈ σ n + n +1 { b ǫ ( σ n ) − σ n + M n +1 }

Convergence 1 σ n +1 = σ n + n +1 { β τ ( σ n ) − σ n + M n +1 } 1 ∈ σ n + n +1 { b ǫ ( σ n ) − σ n + M n +1 } But can now consider the effect of using smooth best response β τ n with τ n → 0 . . . . . . it means that ǫ n → 0 , resulting in a GWFP!

Random belief learning

Random beliefs (Friedman and Mezzetti 2005) Best response ‘assumes’ complete confidence in: • knowledge of the reward functions • beliefs σ about opponent strategy

Random beliefs (Friedman and Mezzetti 2005) Best response ‘assumes’ complete confidence in: • knowledge of the reward functions • beliefs σ about opponent strategy Uncertainty in the beliefs σ n ← → distribution on belief space

Belief distributions • The belief about player j is that π j ∼ µ j • E µ j [ π j ] = σ j , the focus of µ j .

Generalised weakened fictitious play and random belief learning - PowerPoint PPT Presentation

Generalised weakened fictitious play and random belief learning David S. Leslie 12 April 2010 Collaborators: Sean Collins, Claudio Mezzetti, Archie Chapman Overview Learning in games Stochastic approximation Generalised weakened

Multi-agent learning Fictitious Play Gerard Vreeswijk , Intelligent Systems Group, Computer

Implementing Generalised Alt Gavin Lowe Implementing Generalised Alt 02 CSO for dummies

Generalised Parsing with Parser Combinators L. Thomas van Binsbergen Royal Holloway, University

Overview Independence Belief Networks Conditional Independence Belief networks Chris

26:198:722 Expert Systems I Dempster-Shafer Belief Functions I Combining Belief Functions I Types

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Generalised Closed Unbounded and Stationary Sets Hazel Brickhill Young Set Theory Workshop 28

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Generalised n -gons with symmetry conditions Joy Morris joint work with John Bamberg, Michael

Generalised Quantifiers on Automatic Structures Sasha Rubin rubin@cs.auckland.ac.nz Department

Introduction: Belief vs Degrees of Belief Hannes Leitgeb LMU Munich October 2014 My three

Generalised Eden growth model and random planar trees Marco Longfils Sergei Zuyev Chalmers

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Fictitious Play beats Simplex for fractional packing and covering Christos Koufogiannakis and

Belief Decision Behavior: Theory and Evidence Todd Davies Belief Concepts Proposition

Belief and assertion. Evidence from mood shift Alda Mari Institut Jean Nicod , cnrs/ens/ehess/psl

Global and regional emissions estimates for HCFC-22 Eri Saikawa Massachusetts Institute of

ENVIRONMENTAL GEOMECHANICS CE-641 Department of Civil Engineering DR. D. N. SINGH

STOKE & STAFFORDSHIRE ENERGY STRATEGY ARCHIE CORLISS Agenda. 10:00 Welcome and

Peer-to-Peer Computing Peer-to-Peer (P2P) employ distributed resources to perform function in a

Of Search and Semantics Patrick Pantel NSF Symposium on Semantic Knowledge Discovery,

Decentralized Coordination of Distributed Interdependent Services Thomas Reicher, Asa

Clinical Research: Why it Matters in 2017 James D. Douketis MD, FRCP(C) Dept. of Medicine, St.

QUESTIONS)*)h,p://j.mp/EMCYM) ) DEVOTION)&)PRAYER ) ) ARCHIE)POULOS ) Gener eneration t

Generalised weakened fictitious play and random belief learning - PowerPoint PPT Presentation

Generalised weakened fictitious play and random belief learning David S. Leslie 12 April 2010 Collaborators: Sean Collins, Claudio Mezzetti, Archie Chapman Overview Learning in games Stochastic approximation Generalised weakened

Multi-agent learning Fictitious Play Gerard Vreeswijk , Intelligent Systems Group, Computer

Implementing Generalised Alt Gavin Lowe Implementing Generalised Alt 02 CSO for dummies

Generalised Parsing with Parser Combinators L. Thomas van Binsbergen Royal Holloway, University

Overview Independence Belief Networks Conditional Independence Belief networks Chris

26:198:722 Expert Systems I Dempster-Shafer Belief Functions I Combining Belief Functions I Types

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Generalised Closed Unbounded and Stationary Sets Hazel Brickhill Young Set Theory Workshop 28

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Generalised n -gons with symmetry conditions Joy Morris joint work with John Bamberg, Michael

Generalised Quantifiers on Automatic Structures Sasha Rubin rubin@cs.auckland.ac.nz Department

Introduction: Belief vs Degrees of Belief Hannes Leitgeb LMU Munich October 2014 My three

Generalised Eden growth model and random planar trees Marco Longfils Sergei Zuyev Chalmers

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Fictitious Play beats Simplex for fractional packing and covering Christos Koufogiannakis and

Belief Decision Behavior: Theory and Evidence Todd Davies Belief Concepts Proposition

Belief and assertion. Evidence from mood shift Alda Mari Institut Jean Nicod , cnrs/ens/ehess/psl

Global and regional emissions estimates for HCFC-22 Eri Saikawa Massachusetts Institute of

ENVIRONMENTAL GEOMECHANICS CE-641 Department of Civil Engineering DR. D. N. SINGH

STOKE &amp; STAFFORDSHIRE ENERGY STRATEGY ARCHIE CORLISS Agenda. 10:00 Welcome and

Peer-to-Peer Computing Peer-to-Peer (P2P) employ distributed resources to perform function in a

Of Search and Semantics Patrick Pantel NSF Symposium on Semantic Knowledge Discovery,

Decentralized Coordination of Distributed Interdependent Services Thomas Reicher, Asa

Clinical Research: Why it Matters in 2017 James D. Douketis MD, FRCP(C) Dept. of Medicine, St.

QUESTIONS)*)h,p://j.mp/EMCYM) ) DEVOTION)&amp;)PRAYER ) ) ARCHIE)POULOS ) Gener eneration t

STOKE & STAFFORDSHIRE ENERGY STRATEGY ARCHIE CORLISS Agenda. 10:00 Welcome and

QUESTIONS)*)h,p://j.mp/EMCYM) ) DEVOTION)&)PRAYER ) ) ARCHIE)POULOS ) Gener eneration t