Beliefs and Learning in Repeated Games Florin Constantin and Ivo - PDF document

Beliefs and Learning in Repeated Games Florin Constantin and Ivo Parashkevov March 15, 2006

Context • 2-player discounted repeated games [can be extended to n -player] • want to provably learn equilibrium play, as quickly as possible and with as little info as possible 1

Rational (Bayesian) Learning • use beliefs about opponents’ strategies to guide prediction of future play • play Best Response to beliefs • update beliefs based on actual play • learning = recurrently update beliefs until convergence to equilibrium 2

Belief Learning vs. Bayesian Learning • Behavior Strategy: history → distribution over opponent’s play in next period. Example: h opp = ( C, C, D ) → Pr t =4 ( C ) = 2 3 , Pr t =4 ( D ) = 1 3 • Belief Learning - prediction rule as behavior strategy : associate probabilities with future play of opponents based on play history. Best Respond to prediction rule • Bayesian Learning - Best Respond to beliefs • Belief Learning as Bayesian Learning: Best Respond to belief that puts probability 1 on the behavior strategy predicted by the prediction rule 3

Belief Learning vs. Bayesian Learning II • Bayesian Learning as Belief Learning: For any belief B of player 1 over player 2’s behavior strategies, there exists an equivalent belief assigning probability 1 to a particular behavior strategy (called reduced form of B ). Prediction rule: predict the reduced form 4

Fictitious Play • P (opponent plays s at time t ) = t k t + k (freq of s up to time t ) + t + k prior( s ) • Assumptions – myopia • if it converges, it converges to NE 5

Calibrated Learning • use forecasts; if – every player plays Best Response to forecasts – forecasts are calibrated then learning converges to Correlated Equi- librium • history is correlating device (umpire) • Assumptions – stationary tie-breaking rule 6

Problematic assumptions in papers so far • myopia – ignores strategic considerations about the future - can not experiment for long run benefit – can only implement any NE of repeated game that consist of NE in stage games (e.g. no trigger strategies) • observable rewards • common prior 7

Kalai & Lehrer - Rational Learning • Setting: – n -player infinitely repeated discounted games – subjective rationality - best responding to beliefs – learning is through Bayesian updating of individual prior – encode beliefs as behavior strategies • Main result: if individual beliefs are compatible with actual play then best response to beliefs leads to accurate prediction of future play. Play converges to Nash equilibrium play . 8

Assumptions • Perfect monitoring - observe actions of other players • Independence of other players’ actions and beliefs • No longer assume common prior or myopia • Opponents not assumed to be rational • Knowledge of own payoff matrix 9

Some Notation • n finite sets Σ 1 , Σ 2 , ..., Σ n of actions • H t - set of histories of length t . H = � t H t is the set of all finite histories. • behavior strategy of player i is a function f i : H → ∆(Σ i ), where ∆(Σ i ) denotes the set of probability distributions over Σ i • µ f is the probability distribution over the set of infinite play paths induced by the strategy vector f 10

Absolute Continuity and Grain of Truth Assumptions What does it mean to have ”beliefs compatible with actual play”? • Do not assign zero probability to events that can occur in the play of the game. • ”Grain of Truth” - beliefs about opponent’s play assign a (small) positive probability to the strategy actually chosen. – Sufficient, but stronger than needed. • Absolute Continuity - measure µ f is absolutely continuous w.r.t. µ g ( µ f ≪ µ g ) if µ f ( A ) > 0 ⇒ µ g ( A ) > 0 for all sets A ⊆ Σ ∞ . • Main result requires: actual µ ≪ belief ˜ µ i 11

Prisoner’s Dilemma Example D C D 0,0 2,-1 C -1,2 1,1 • Consider strategies – g ∞ : grim trigger – g t : use grim trigger until time t , then defect forever • P1 assigns probs ( β 0 , β 1 , . . . , β ∞ ) to P2 playing ( g 0 , g 1 , . . . , g ∞ ), β t > 0. P2 assigns probs ( α 0 , α 1 , . . . , α ∞ ) to P1 playing ( g 0 , g 1 , . . . , g ∞ α t > 0. • According to own learning parameters, P1 chooses g t 1 and P 2 chooses g t 2 . 12

Prisoner’s Dilemma Example • all events with positive probability in the game (C until time t < min( t 1 , t 2 ), D after min( t 1 , t 2 ) etc) are associated positive probability by players’ beliefs: beliefs are compatible with actual play. • learning must occur - if t 1 < t 2 then P2 will assign prob 1 to P1 playing g t 1 from time t 1 +1 on. So P2 knows that P1 will defect forever. • P1 will not know P2’s strategy - will only know that t 2 > t 1 , but will be able to predict that P2 will defect forever as well - future play is learned only on the play path . 13

Prisoner’s Dilemma Example What if t 1 = t 2 = ∞ ? • after time t , P1 knows P2 did not play g 0 , . . . , g t and assigns probabilities ∞ � ( β t +1 , . . . , β ∞ ) / β r r = t +1 to ( g t +1 , . . . , g ∞ ). β ∞ t →∞ Since β ∞ > 0, 1 → � ∞ r = t +1 β r • P1 becomes more and more confident that P2 is playing g ∞ , but never knows for sure . 14

Definitions • Let ε > 0 and µ, ˜ µ two probability measures. µ is ε -close to ˜ µ if ∃ set Q such that – µ ( Q ) > 1 − ε and ˜ µ ( Q ) > 1 − ε – ∀ A ⊆ Q, (1 − ε )˜ µ ( A ) ≤ µ ( A ) ≤ (1+ ε )˜ µ ( A ) • f plays ε -like g if µ f is ε -close to µ g . • Let f be a strategy, t ≥ 0 and h a history up to time t . The induced strategy f h ( · ) f h ( h ′ ) = f ( concat ( h, h ′ )) for all h ′ of length r 15

Theorem 1 Let f be the strategy that is actually chosen and f i be the beliefs of player i . Assume f is absolutely continuous with respect to f i . Then ∀ ε > 0 for almost every play path z ∃ time T ( z, ε ) such that ∀ t ≥ T ( z, ε ), f z ( t ) plays ε -like f i z ( t ) . If players maximize payoff then they will eventually be playing a subjective ε -equilibrium : • each player plays a Best Response to own beliefs • these beliefs are ε -never going to be con- tradicted by actual play Interpretation? 16

Theorem 2 Let f be the strategy that is actually chosen and f 1 , . . . , f n be the beliefs of players 1 , . . . , n . Assume • f is absolutely continuous with respect to f i • each player plays a Best Response to its beliefs. Then ∀ ε > 0, for almost every play path z ∃ a time T ( z, ε ) such that for all t ≥ T ( z, ε ) there exists an ε -Nash Equilibrium ¯ f of the repeated game such that f z ( t ) plays ε -like ¯ f . 17

Comments • Theorem 1 does not assume anything about players’ strategies. • Convergence of beliefs with reality occurs only on the actual play path. Players do not learn what their opponents will do in response to actions that will not be taken. • If players are best responding (Theorem 2), then convergence is to NE play in the repeated game, not to repeated play of a single stage NE. • Convergence is to an equilibrium play , not to an equilibrium. We are not learning Nash strategies, but we can learn to play as if we knew them 18

• So what? • If – Assumptions are met – All other players play Best Response to their beliefs can you do better? 19

Beliefs in Repeated Games - Nachbar 2005 Main Result For a large class of repeated games, beliefs can not simultaneously satisfy: • learnability • consistency • CSP (diversity of belief condition) 20

Learnability - informally Player 1 learns to predict the path of play generated by σ 2 if her one -period-ahead forecasts along the path of play eventually becomes almost as accurate as if she knew σ 2 . 21

Learnability - formally Fix a belief β 2 of player 1 about player 2’s strategy. Player 1 learns to predict the path of play generated by behavior strategy σ = ( σ 1 , σ 2 ) iff • ∀ finite history h , µ ( σ 1 ,σ 2 ) ( h ∗ ) > 0 ⇒ µ ( σ 1 ,β 2 ) ( h ∗ ) > 0 h ∗ = the set of all paths of play starting with h • ∀ ε > 0 and for almost all paths of play z , ∃ T ( ε, z ) such that for any time t > T ( ε, z ) and any action a 2 of player 2, | (the prob that σ 2 ( h ) assigns to a 2 ) − (the prob that σ β 2 ( h ) assigns to a 2 ) | < ε where h = the first t stages of z and σ β 2 = the reduced form of β 2 . 22

CSP Two conditions: CS and P, both addressing the richness of ˆ Σ. All restrictions only on the path of play! • CS - (Weak) Caution and Symmetry – s 1 is a simple variant of s 2 if s 1 can be generated from s 2 by a uniform relabel- ing of actions – Weak Caution means: if I believe you could play the pure strategy s 1 , then I also believe you could play all simple variants of s 1 . Strong caution would mean ˆ S i = S i – Symmetry means: if I believe that you can play s 1 , then you believe I can play all simple variants of s 1 . 23

– Symmetry motivated by the necessity to have equally powerful strategy-generating machines. • P ˆ – if a behavior strategy σ 2 is in Σ 2 , then at least one pure strategy that coarsely ˆ approximates σ 2 is in Σ 2 as well.

Beliefs and Learning in Repeated Games Florin Constantin and Ivo - PDF document

Beliefs and Learning in Repeated Games Florin Constantin and Ivo Parashkevov March 15, 2006 Context 2-player discounted repeated games [can be extended to n -player] want to provably learn equilibrium play, as quickly as possible and

Dynamic Games in Environmental Economics PhD minicourse Part I: Repeated Games and Self-Enforcing

Game Theory Repeated Games Levent Ko ckesen Ko c University Levent Ko ckesen (Ko c

Repeated Games with Perfect Monitoring Mihai Manea MIT Repeated Games normal-form stage game

Environmental Economics 4910 Brd Harstad UiO February 2019 Brd Harstad (UiO) Repeated

Repeated games Felix Munoz-Garcia Strategy and Game Theory - Washington State University Repeated

Finitely Repeated Games 14.12 Game Theory Muhamet Yildiz 1 Road Map 1.

REPEATED GAMES Overview Context: players (e.g., firms) interact with each other on an ongoing

The Diversity of Beliefs in Real Time: The Diversity Diversity of of Beliefs Beliefs in Real

Game theory for wireless networks static games; dynamic games; repeated games; strict and weak

Multi-agent learning Repeated games Gerard Vreeswijk , Intelligent Systems Group, Computer Science

Analysis of variance and regression 2009-3-11 Lene Theil Skovgaard Repeated measurements May

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Repeated Games and Networks Literature Francesco Nava London School of Economics April 2015

Shops. In addition, the relatively small shops are currently given different names, such as

Iowa Board of Regents UNI Town Hall Meeting October 6, 2014 10/9/14 Table of Contents Table of

Institutions and Economic Growth: In Search of Robustness Mariusz Prchniak, Ph.D. Department

Transcription: Q1-report 2014 Title: Cloetta Q1 Report 2014 Date: 25.04.2014 Speakers: Jacob

Interim Results for the six months ended 30 September 2017 November 2017 The content of this

Annual Meet August 2019 Chetan Phalke, Head Research www.alphainvesco.com 2 Alpha Invesco

Presentation to Investors San Francisco 11-12 May 2017 1 Disclaimer This presentation may

Paper presented for the 4 th International PPP Conference, Geneva 3-4 February 2011 Sverre

Sambuz

Useful Links

Newsletter

Mail Us