Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, - PowerPoint PPT Presentation

Multi-agent learning Fictitious Play Fi titious Pla y Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 1

Multi-agent learning Fictitious Play follo w er Fictitious play: motivation strategy . • Rather than considering your the most important, own payoffs, monitor the representative of a single mixed behaviour of your opponent(s), strategy . and respond optimally. • Behaviour of an opponent is projected on a • Brown (1951): explanation for Nash equilibrium play. In terms of current use, the name is a bit of a misnomer, since play actually occurs (Berger, 2005). • One of the most important, if not Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 2

Multi-agent learning Fictitious Play Plan for today Part I. Best reply strategy 1. Pure fictitious play. 2. Results that connect pure fictitious play to Nash equilibria. Part II. Extensions and approximations of fictitious play 1. Smoothed fictitious play. 2. Exponential regret matching. 3. No-regret property of smoothed fictitious play (Fudenberg et al. , 1995). 4. Convergence of better reply strategies when players have limited memory and are inert [tend to stick to their current strategy] (Young, 1998). Shoham et al. (2009): Multi-agent Systems . Ch. 7: “Learning and Teaching”. H. Young (2004): Strategic Learning and it Limits , Oxford UP. D. Fudenberg and D.K. Levine (1998), The Theory of Learning in Games , MIT Press. Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 3

Multi-agent learning Fictitious Play P art I: P ure fictitious play Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 4

Multi-agent learning Fictitious Play Repeated Coordination Game Players receive payoff p > 0 iff they coordinate. This game possesses three Nash equilibria, viz. ( 0, 0 ) , ( 0.5, 0.5 ) , and ( 1, 1 ) . Round A ’s action B ’s action A ’s beliefs B ’s beliefs ( 0.0, 0.0 ) ( 0.0, 0.0 ) 0. ( 0.0, 1.0 ) ( 1.0, 0.0 ) 1. L* R* ( 1.0, 1.0 ) ( 1.0, 1.0 ) 2. R L ( 1.0, 2.0 ) ( 2.0, 1.0 ) 3. L* R* ( 2.0, 2.0 ) ( 2.0, 2.0 ) 4. R L ( 2.0, 3.0 ) ( 2.0, 3.0 ) 5. R* R* ( 2.0, 4.0 ) ( 2.0, 4.0 ) 6. R R ( 2.0, 5.0 ) ( 2.0, 5.0 ) 7. R R . . . . . . . . . . . . . . . Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 5

Multi-agent learning Fictitious Play steady state (or abso rbing state ) of fictitious play if it is the case that whenever a is played at round t Steady states are pure (but possibly weak) Nash equilibria Definition (Steady state). An action profile a is a then, inevitably, it is also played at round t + 1. Theorem . If a pure strategy profile is a steady state of fictitious play, then it is a (possibly weak) Nash equilibrium in the stage game. Proof . Suppose a = ( a 1 , . . . , a n ) is a steady state. Consequently, i ’s opponent model converges to a − i , for all i . By definition of fictitious play, i plays best responses to a − i , i.e., ∀ i : a i ∈ BR ( a − i ) . The latter is precisely the definition of a Nash equilibrium. � Still, the resulting Nash equilibrium is often strict, because for weak equilibria the process is likely to drift due to alternative best responses. Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 6

Multi-agent learning Fictitious Play Pure strict Nash equilibria are steady states Theorem . If a pure strategy profile is a strict Nash equilibrium of a stage game, then it is a steady state of fictitious play in the repeated game. Notice the use of terminology: “pure strategy profile” for Nash equilibria; “action profile” for steady states. Proof . Suppose a is a pure Nash equilibrium and a i is played at round t , for all i . Because a is strict, a i is the unique best response to a − i . Because this argument holds for each i , action profile a will be played in round t + 1 again. � Summary of the two theorems : Pure strict Nash ⇒ Steady state ⇒ Pure Nash. But what if pure Nash equilibria do not exist? Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 7

Multi-agent learning Fictitious Play Repeated game of Matching Pennies Zero sum game. A ’s goal is to have pennies matched. B maintains opposite. Round A ’s action B ’s action A ’s beliefs B ’s beliefs ( 1.5, 2.0 ) ( 2.0, 1.5 ) 0. ( 1.5, 3.0 ) ( 2.0, 2.5 ) 1. T T ( 2.5, 3.0 ) ( 2.0, 3.5 ) 2. T H ( 3.5, 3.0 ) ( 2.0, 4.5 ) 3. T H ( 4.5, 3.0 ) ( 3.0, 4.5 ) 4. H H ( 5.5, 3.0 ) ( 4.0, 4.5 ) 5. H H ( 6.5, 3.0 ) ( 5.0, 4.5 ) 6. H H ( 6.5, 4.0 ) ( 6.0, 4.5 ) 7. H T ( 6.5, 5.0 ) ( 7.0, 4.5 ) 8. H T . . . . . . . . . . . . . . . Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 8

Multi-agent learning Fictitious Play Convergent empirical distribution of strategies Theorem . If the empirical distribution of each player’s strategies converges in fictitious play, then it converges to a Nash equilibrium. Proof . Same as before. If the empirical distributions converge to q , then i ’s opponent model converges to q − i , for all i . By definition of fictitious play, q i ∈ BR ( q − i ) . Because of convergence, all such (mixed) best replies remain the same. By definition we have a Nash equilibrium. � Remarks: 3. If empirical distributions 1. The q i may be mixed. converge (hence, converge to a Nash equilibrium), the actually 2. It actually suffices that the q − i played responses per stage need converge asymptotically to the not be Nash equilibria of the actual distribution (Fudenberg & stage game. Levine, 1998). Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 9

Multi-agent learning Fictitious Play Empirical distributions converge to Nash �⇒ stage Nash Repeated Coordination Game. Players receive payoff p > 0 iff they coordinate. Round A ’s action B ’s action A ’s beliefs B ’s beliefs ( 0.5, 1.0 ) ( 1.0, 0.5 ) 0. ( 1.5, 1.0 ) ( 1.0, 1.5 ) 1. B A ( 1.5, 2.0 ) ( 2.0, 1.5 ) 2. A B ( 2.5, 2.0 ) ( 2.0, 2.5 ) 3. B A ( 2.5, 3.0 ) ( 3.0, 2.5 ) 4. A B . . . . . . . . . . . . . . . • This game possesses three equilibria, viz. ( 0, 0 ) , ( 0.5, 0.5 ) , and ( 1, 1 ) , with expected payoffs 1, 0.5, and 1, respectively. • Empirical distribution of play converges to ( 0.5, 0.5 ) ,—with payoff 0, rather than p /2. Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 10

Multi-agent learning Fictitious Play Shapley game . Empirical distribution of play does not need to converge Rock-paper-scissors. Winner receives payoff p > 0. Else, payoff zero. • Rock-paper-scissors with these payoffs is known as the • The Shapley game possesses one equilibrium, viz. ( 1/3, 1/3, 1/3 ) , with expected payoff p /3. Round A ’s action B ’s action A ’s beliefs B ’s beliefs ( 0.0, 0.0, 0.5 ) ( 0.0, 0.5, 0.0 ) 0. ( 0.0, 0.0, 1.5 ) ( 1.0, 0.5, 0.0 ) 1. Rock Scissors ( 0.0, 1.0, 1.5 ) ( 2.0, 0.5, 0.0 ) 2. Rock Paper ( 0.0, 2.0, 1.5 ) ( 3.0, 0.5, 0.0 ) 3. Rock Paper ( 0.0, 3.0, 1.5 ) ( 3.0, 0.5, 1.0 ) 4. Scissors Paper ( 0.0, 4.0, 1.5 ) ( 3.0, 0.5, 2.0 ) 5. Scissors Paper . . . . . . . . . . . . . . . Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 11

Multi-agent learning Fictitious Play Repeated Shapley Game: Phase Diagram Scissors � • � � � Paper Rock � Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 12

Multi-agent learning Fictitious Play P art II: E xtensions and approximations of fictitious play Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 13

Multi-agent learning Fictitious Play Proposed extensions to fictitious play Build forecasts, not on complete history , but on • Recent data , say on m most recent rounds. • Discounted data , say with discount factor γ . • Perturbed data , say with error ǫ on individual observations. • Random samples of historical data, say on random samples of size m . Give not necessarily best responses , but • ǫ -greedy . • Perturbed throughout , with small random shocks. • Randomly, and proportional to expected payoff . Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 14

fo re asting rule for player i is a function that maps a history to a probability Multi-agent learning Fictitious Play Framework for predictive learning (like fictitious play) A resp onse rule for player i is a function that maps a history to a probability distribution over the opponents’ actions in the next round: f i : H → ∆ ( X − i ) . A p redi tive lea rning rule for player i is the combination of a forecasting rule distribution over i ’s own actions in the next round: g i : H → ∆ ( X i ) . A and a response rule. This is typically written as ( f i , g i ) . • This framework can be attributed to J.S. Jordan (1993). • Forecasting and response functions are deterministic. • Reinforcement and regret do not fit. They are not involved with prediction. Last modified on February 27 th , 2012 at 18:35 Gerard Vreeswijk. Slide 15

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, - PowerPoint PPT Presentation

Multi-agent learning Fictitious Play Fititious Pla y Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Last modified on February 27 th ,

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

THE COURSE MICROECONOMICS (ADVANCED COURSE FOR FINANCE) OVERVIEW Russian-Armenian (Slavonic)

Preemption games under L evy uncertainty Svetlana Boyarchenko and Sergei Levendorski i

Outline Security Proofs 1 Cryptography Introduction using the Game-based Methodology Provable

Expectations, Networks, and Conventions Benjamin Golub Stephen Morris Harvard Princeton

Market Failures Capitalism University of Virginia Matthias Brinkmann Contents 1. Game Results

Decentralized Prediction of End-to-End Network Performance Classes Yongjun Liao, Wei Du, Pierre

Foundations of Artificial Intelligence 44. Monte-Carlo Tree Search: Advanced Topics Malte Helmert

An example of Game based proof: OAEP-IND-CPA B. Gr egoire T. Rezk November 14, 2008 B. Gr