Multi-agent learning Fictitious Play Gerard Vreeswijk , Intelligent - PowerPoint PPT Presentation

Multi-agent learning Fictitious Play Multi-agent learning Fictitious Play Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 1

Multi-agent learning Fictitious Play Fictitious play: motivation • Rather than considering your the most important, own payoffs, monitor the representative of a follower behaviour of your opponent(s), strategy . and respond optimally. • Behaviour of an opponent is projected on a single mixed strategy . • Brown (1951): explanation for Nash equilibrium play. In terms of current use, the name is a bit of a misnomer, since play actually occurs (Berger, 2005). • One of the most important, if not Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 2

Multi-agent learning Fictitious Play Plan for today Part I. Best reply strategy 1. Pure fictitious play. 2. Results that connect pure fictitious play to Nash equilibria. Part II. Extensions and approximations of fictitious play 1. Smoothed fictitious play. 2. Exponential regret matching. 3. No regret property of smoothed fictitious play (Fudenberg et al. , 1995). 4. Convergence of better reply strategies when players have limited memory and are inert [tend to stick to their current strategy] (Peyton Young, XXX). Shoham et al. (2009): Multi-agent Systems . Ch. 7: “Learning and Teaching”. H. Peyton Young (2004): Strategic Learning and it Limits , Oxford UP. D. Fudenberg and D.K. Levine (1998), The Theory of Learning in Games , MIT Press. Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 3

Multi-agent learning Fictitious Play P art I: P ure fictitious play Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 4

Multi-agent learning Fictitious Play Repeated Coordination Game Players receive payoff p > 0 iff they coordinate. This game possesses three Nash equilibria, viz. ( 0, 0 ) , ( 0.5, 0.5 ) , and ( 1, 1 ) . Round A ’s action B ’s action A ’s beliefs B ’s beliefs ( 0.0, 0.0 ) ( 0.0, 0.0 ) 0. ( 0.0, 1.0 ) ( 1.0, 0.0 ) * 1. A B ( 1.0, 1.0 ) ( 1.0, 1.0 ) 2. B A ( 1.0, 2.0 ) ( 2.0, 1.0 ) * 3. A B ( 2.0, 2.0 ) ( 2.0, 2.0 ) 4. B A ( 2.0, 3.0 ) ( 2.0, 3.0 ) * 5. B B ( 2.0, 4.0 ) ( 2.0, 4.0 ) 6. B B ( 2.0, 5.0 ) ( 2.0, 5.0 ) 7. B B . . . . . . . . . . . . . . . Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 5

Multi-agent learning Fictitious Play Steady states are pure (but possibly weak) Nash equilibria Definition (Steady state). An action profile a is a steady state (or absorbing state ) of fictitious play if it is the case that whenever a is played at round t it is also played at round t + 1. Theorem . If a pure strategy profile is a steady state of fictitious play, then it is a (possibly weak) Nash equilibrium in the stage game. Proof . Suppose s is a steady state of fictitious play. Consequently, i ’s opponent model converges to s − i , for all i . If s would not be Nash, one of the players would deviate from s i , which would contradict our assumption that s is a Nash equiibrium. a � In practice, the resulting Nash equilibrium is often strict, because a weak equilibrium is unlikely to maintain the process in a steady state. a Ad absurdum is not a preferred route. But sometimes it is more intuitive. Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 6

Multi-agent learning Fictitious Play Pure strict Nash equilibria are steady states Theorem . If a pure strategy profile is a strict Nash equilibrium of a stage game, then it is a steady state of fictitious play in the repeated game. Notice the use of terminology: “pure strategy profile” for Nash equilibria; “action profile” for steady states. Proof . Suppose s is a pure Nash equilibrium. Because s is pure, each s i is deterministic (not a mix). Suppose s is played at round t . Because s is Nash, a best response to s − i is action s i . (There might be others!) Because s is a strict equilibrium, s i is the unique best response to s − i . Because this argument holds for each i , action profile s will be played in round t + 1 again. � Summary of the two theorems : Pure strict Nash ⇒ Steady state ⇒ Pure Nash. But what if pure Nash equilibria do not exist? Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 7

Multi-agent learning Fictitious Play Repeated game of Matching Pennies Zero sum game. A ’s goal is to have pennies matched. Round A ’s action B ’s action A ’s beliefs B ’s beliefs ( 1.5, 2.0 ) ( 2.0, 1.5 ) 0. ( 1.5, 3.0 ) ( 2.0, 2.5 ) 1. T T ( 2.5, 3.0 ) ( 2.0, 3.5 ) 2. T H ( 3.5, 3.0 ) ( 2.0, 4.5 ) 3. T H ( 4.5, 3.0 ) ( 3.0, 4.5 ) 4. H H ( 5.5, 3.0 ) ( 4.0, 4.5 ) 5. H H ( 6.5, 3.0 ) ( 5.0, 4.5 ) 6. H H ( 6.5, 4.0 ) ( 6.0, 4.5 ) 7. H T ( 6.5, 5.0 ) ( 7.0, 4.5 ) 8. H T . . . . . . . . . . . . . . . Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 8

Multi-agent learning Fictitious Play Convergent empirical distribution of strategies Theorem . If the empirical distribution of each player’s strategies converges in fictitious play, then it converges to a Nash equilibrium. Proof . Same as before. If the empirical distributions converge to s , then i ’s opponent model converges to s − i , for all i . If s would not be Nash, one of the players would deviate from s i , which would contradict the convergence of the empirical distribution. � Remarks: 3. If empirical distributions converge (hence, converge to a 1. The s i may be mixed. Nash equilibrium), the actually 2. It actually suffices that the s − i played responses per stage need converge asymptotically to the not be Nash equilibria of the actual distribution. stage game. Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 9

Multi-agent learning Fictitious Play Empirical distributions converge to Nash �⇒ stage Nash Repeated Coordination Game. Players receive payoff p > 0 iff they coordinate. Round A ’s action B ’s action A ’s beliefs B ’s beliefs ( 0.5, 1.0 ) ( 1.0, 0.5 ) 0. ( 1.5, 1.0 ) ( 1.0, 1.5 ) 1. B A ( 1.5, 2.0 ) ( 2.0, 1.5 ) 2. A B ( 2.5, 2.0 ) ( 2.0, 2.5 ) 3. B A ( 2.5, 3.0 ) ( 3.0, 2.5 ) 4. A B . . . . . . . . . . . . . . . • This game possesses three equilibria, viz. ( 0, 0 ) , ( 0.5, 0.5 ) , and ( 1, 1 ) , with expected payoffs 1, 0.5, and 1, respectively. • Empirical distribution of play converges to ( 0.5, 0.5 ) ,—with payoff 0, rather than p /2. Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 10

Multi-agent learning Fictitious Play Empirical distribution of play does not need to converge Rock-paper-scissors. Winner receives payoff p > 0. Else, payoff zero. • Rock-paper-scissors with these payoffs is known as the Shapley game . • The Shapley game possesses one equilibrium, viz. ( 1/3, 1/3, 1/3 ) , with expected payoff p /3. Round A ’s action B ’s action A ’s beliefs B ’s beliefs ( 0.0, 0.0, 0.5 ) ( 0.0, 0.5, 0.0 ) 0. ( 0.0, 0.0, 1.5 ) ( 1.0, 0.5, 0.0 ) 1. Rock Scissors ( 0.0, 1.0, 1.5 ) ( 2.0, 0.5, 0.0 ) 2. Rock Paper ( 0.0, 2.0, 1.5 ) ( 3.0, 0.5, 0.0 ) 3. Rock Paper ( 0.0, 3.0, 1.5 ) ( 3.0, 0.5, 1.0 ) 4. Scissors Paper ( 0.0, 4.0, 1.5 ) ( 3.0, 0.5, 2.0 ) 5. Scissors Paper . . . . . . . . . . . . . . . Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 11

Multi-agent learning Fictitious Play Repeated Shapley Game: Phase Diagram Scissors � • � � � Paper Rocks � Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 12

Multi-agent learning Fictitious Play P art II: E xtensions and approximations of fictitious play Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 13

Multi-agent learning Fictitious Play Proposed extensions to fictitious play Build forecasts, not on complete history , but on • Recent data , say on m most recent rounds. • Discounted data , say with discount factor γ . • Perturbed data , say with error ǫ on individual observations. • Random samples of historical data, say on random samples of size m . Give not necessarily best responses , but • ǫ -greedy . • Perturbed throughout , with small random shocks. • Randomly, and proportional to expected payoff . Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 14

Multi-agent learning Fictitious Play Framework for predictive learning (like fictitious play) A forecasting rule for player i is a function that maps a history to a probability distribution over the opponents’ actions in the next round: f i : H → ∆ ( X − i ) . A response rule for player i is a function that maps a history to a probability distribution over i ’s own actions in the next round: g i : H → ∆ ( X i ) . A predictive learning rule for player i is the combination of a forecasting rule and a response rule. This is typically written as ( f i , g i ) . • This framework can be attributed to J.S. Jordan (1993). • Forecasting and response functions are deterministic. • Reinforcement and regret do not fit. They are not involved with prediction. Gerard Vreeswijk. Slides last processed on Tuesday 2 nd March, 2010 at 13:53h. Slide 15

Multi-agent learning Fictitious Play Gerard Vreeswijk , Intelligent - PowerPoint PPT Presentation

Multi-agent learning Fictitious Play Multi-agent learning Fictitious Play Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Gerard Vreeswijk. Slides last

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

for After Tax Season Profits By Salim Omar, CPA 1 Welcome Congrats! You made it. 2 Which One

Scheduling and queue management DigiComm II Traditional queuing behaviour in routers Data

Notes Adams-Bashforth Adams-Bashforth family are examples of Notes for last part of Oct 11

A Nonlinear Contour Preserving Ward Van Aerschot Transform for Geometrical Image Compression

Online Adver/sing In the beginning, S LIDES BY P ROF . J

IMC and Advertising Discussion Results How can we measure the success of a marketing

The RoboCup 2013 Drop-In Player Challenges: Experiments in Ad Hoc Teamwork Patrick MacAlpine,

Mean Field Equilibria of Dynamic Auctions Ramesh Johari Stanford University June 7, 2012 1 / 99