Outline Algorithms for Multiagent Learning A. Introduction - PowerPoint PPT Presentation

� � � � � � � � � � � Outline Algorithms for Multiagent Learning A. Introduction Equilibrium Learners B. Single Agent Learning Regret Minimizing Algorithms C. Game Theory Best Response Learners – Q-Learning D. Multiagent Learning – Opponent Modeling Q-Learning E. Future Issues and Open Problems – Gradient Ascent – WoLF Learning to Coordinate SA3 – D9 SA3 – D10 What’s the Goal? Q-Learning Learn a best response, if one exists. ... or any MDP learning algorithm. Make some other guarantees. For example, The most commonly used approach to learning in multiagent systems. And, not without success. – Convergence of payoffs or policies. If it is the only learning agent. . . – Low regret or at least minimax optimal. – Recall, if the other agents are using a stationary If best response learners converge against each strategy, it becomes an MDP . other, then it must be to a Nash equilibrium. – Q-learning will converge to a best-response. Otherwise, requires on-policy learning. SA3 – D11 SA3 – D12

✢ � ✗ ✢ ✆ ✓ ✑ ✆ ✗ � ✓ ✑ ✚ ✖ ✓ ✑ ✖ ✓ ✘ ✗ ✆ ✆ ✪ ✩ ✄ ✑ ✓ � ✂ ✫ ✌ ✣ ✩ ★ ✣ ✆ ✓ ✩ � ✤ ✑ ✗ ✁ � ✢ � ✏ ✣ � � � � ✆ ✑ ✏ ✖ ✆ ✣ ✯ � ✏ � ✁ ✆ ✏ ✮ ✏ ✆ ✖ ✏ ✛ ✂ ☞ ✖ ✑ ✌ ☞ ✯ ✟ ✞ ✝ ✆ ✆ Q-Learning Opponent Modeling Q-Learning (Uther, 1997) and others. Dominance solvable games. Fictitious play in stochastic games using approximation. It has also been successfully applied to. . . Choose action that maximizes, – Team games. ✄✒✑ ✓✕✔ (Sen et al. 1994; Claus & Boutilier, 1998) ✂☎✄ ✂☎✄ ✆✜✛ ✠☛✡ ✓✙✔ ✂☎✄ ✡✎✍ – Games with pure strategy equilibria. (Tan, 1993; Crites & Sandholm, 1995, Bowling, 2000) Update opponent model and Q-values, – Adversarial games. ✥✧✦ ✂☎✄ ✂☎✄✒✑ ✂☎✄✒✑ ✂☎✄ ✆✭✬ ✂☎✄ (Tesauro, 1995; Uther, 1997) ✄✒✑ TD-Gammon remains one of the most convincing ✂☎✄ ✂☎✄ ✂☎✄ ✓✕✔ ✓✕✔ successes of reinforcement learning. SA3 – D13 SA3 – D14 Opponent Modeling Q-Learning Gradient Ascent Superficially less naive than Q-learning. Compute gradient of value with respect to the player’s strategy. – Recognizes the existence of other agents. Adjust policy to increase value. – . . . but assumes they use a stationary policy. Single-agent learning (parameterized policies). Similar results to Q-learning, but faster approximation. (Williams, 1993; Sutton et al., 2000, Baxter & Bartlett, 2000) (Uther, 1997) — Hexcer Multiagent Learning. First 50000 Games Second 50000 Games (Singh, Kearns, & Mansour, 2000; Bowling & Veloso, 2002, 2003; MMQ Q OMQ MMQ Q OMQ Zinkevich, 2003) MMQ — 27% 32% MMQ — 45% 43% Q 73% — 40% Q 55& — 41% OMQ 68% 60% — OMQ 57% 59% — SA3 – D15 SA3 – D16

✒ ☛ ✆ ✒ ✍ ✚ ✛ ✆ ✁ ✍ ✚ ✓ ✜ ✗ ✡ ✮ ✠ ✚ ✏ ✑ ✚ ✒ ✗ ✍ ✚ ✑ ✚ ✛ ✆ ✁ ✝ ✕ ✚ ✬ ✁ ✑ ✖ ✕ ✌ ✄ ✝ ✝ ✕ ✄ ✆✝ ✒ ✗ ✮ ✝ ✏ ✑ ✬ ✗ ✑ ✁ ✑ ✖ ✙ ✕ ✌ ✠ ✝ ✑ ✓ ✗ ✍ ★ ✥ ✧ ✩ ✁ ✂ ✣ ✖ ✖ ✙ ✣ ✞ ✂ ✑ ✧ ✞ ✓ ✂ ✌ ✄ ✆✝ ✕ ✄ ✝ ✝ ✒ ✌ ✠ ✥ ✥ ✜ ✫ ✗ ✡ ☛ ✭ ✚ ✏ ✑ ✚ ✒ ✗ ✑ ✚ ✬ ✫ ✦ ✫ ✪ ✜ ✢ ✣ ✞ ✒ ✝ ✝ ✠ ✕ ✤ ✥ ✍ ✒ ✆ ✒ ✒ ✁ ✍ ✑ ✄ ✆ ✆ ✓ ✍ ✌ ✔ ✕ ✑ ✄ ✏ ✆✝ ✓ ✌ ✔ ✕ ✍ ✒ ✑ ✄ ✝ ✆ ✓ ✌ ✑ ✬ ✕ ✞ � ✁ ✮ ✆ ✆ ✄ ✆✝ ✄ ✝ ✆ ✄ ✝ ✝ ✟ ✯ ✁ ✲ ✆ ✆ ✠ ✆✝ ✠ ✝ ✆ ✠ ✝ ✝ ✞ ✔ ✍ ✑ ✆✝ ✒ ✓ ✄ ✝ ✝ ✬ ✖ ✁ ✄ ✆ ✆ ✕ ✄ ✕ ✝ ✄ ✝ ✆ ✓ ✄ ✝ ✝ ✮ ✱ ✗ ✯ ✬ ✏ ✝ ✄ ✒ ✍ ✌ ✔ ✕ ✑ ✒ ✄ ✝ ✝ ✁ ✖ ✍ ✑ ✓ ✌ ✕ ✄ ✆✝ ✕ ✄ ✝ ✝ ✒ ✓ ✑ ✌ ✄ ✝ ✆ ✝ Infinitesimal Gradient Ascent IGA (Singh, Kearns, & Mansour, 2000) ✡☞☛ ✌✎✍ ✂☎✄ ✂☎✠ ✌✎✍ ✡☞✘ ✌✎✍ ✡☞☛ ✌✎✍ ✌✎✍ where, SA3 – D17 SA3 – D18 IGA — Theorem IGA — Proof (Singh et al., 2000) Theorem. If both players follow Infinitesimal Gradient Ascent (IGA), where , then their strategies will converge to a Nash equilibrium OR the average payoffs over time will converge in the limit to the expected payoffs of a Nash equilibrium. A A D D B B C C is not invertible has real eigenvalues has imaginary eigenvalues ✯✰✭ or SA3 – D19 SA3 – D20

✕ ✖ ✍ � � � ✙✚✛✜ � ✟ ✠ ✔ ✌ ✁ ✛ ✍ ✎✏ ✑ ✔ ✔ ✂ ✁ ✕ ✒ ✘ ✂ ✚ ✆ ✄ ✕ ✁ ✂ ✆ ✛ ✚ ✁ ✔ ✒ ✝ ✝ ✄ ✆✝ ✁ ✄ ✓ ✖ ✑ ✌ ✜ ✓ ✆ ✛ ✚ ✍ ☎ ✏ ✞ ✂ ✂ ✔ ✔ ✁ ✂ ✆ ✛ ✚ ✁ ✚ � ✓ ✁ � ✜ � ✂ ✏ ✄ ☎ ✂ � ✘ ✒ ✓ ✚ ✒ ✑ � ✜ ✓ ✂ ✚ ✁ ✔ ✔ ✓ ✖ ✒ ✎✏ ✛ ✍ ✖ ✌ ✖ ✠ ✟ ✞ ✝ ✁ ✂ ✆ ✝ IGA — Summary GIGA (Zinkevich, 2003) One of the first convergence proofs for a payoff maximizing multiagent learning algorithm. Generalized Infinitesimal Gradient Ascent (GIGA). Expected payoffs do not necessarily converge. – At time , select actions according to . – After observing others select , ✄✆☎ Reward ✌ ✕✗✖ ✡☞☛ Average i.e., step the probability distribution toward immediate reward, then project into a valid Time probability space. SA3 – D21 SA3 – D22 GIGA GIGA — Intuition GIGA is identical to IGA for two-player, two-action Assumption: Policy gradient is bounded. games, while approximating the gradient. IGA ✌ ✕✗✖ ✡☞☛ GIGA GIGA is universally consistent! SA3 – D23 SA3 – D24

✚ ✍ ✜ � ✚ ✘ ✗ ✡ ☛ ✚ ✒ ✏ ✑ ✚ ✒ ✗ ✑ ✑ ☞ ✓ ✑ � ✏ � ✚ ☛ ✗ ✆ � ✚ ✑ ✁ ✚ ✒ ✗ ✍ ✑ ✚ ✛ ✆ ✏ ✚ ✓ � ✒ � ✆ ✟ ✠ ✑ ✏ ✚ ✑ ✘ ✁ ✌ � ✆ ✝ ✞ ✚ ✚ ✏ ☛ ✡ ✁ ✌ � ✆ ✝ ✞ ✚ ✘ ☛ ✍ ✚ ✏ ✑ ✚ ✒ ☛ ✒ ✚ ✜ ✚ ✡ ✑ ✚ ☛ ✗ � ✣ ✚ ✏ ✚ ✜ ✒ ✗ ✍ ✑ ✚ ✛ ✆ ✁ � ✓ ✚ ☛ � ✞ � ✝ ✆ � � ✠ ✚ ✟ ✆ ✍ ✚ ✛ ✆ ✁ ✍ ✑ ✓ ✍ ✡ ✝ ✞ ✏ � ✆ ✟ ✠ ☛ � ✣ ✠ ✟ ✍ ✚ ✛ ✆ ✁ ✆ ☎ ✜ ✏ � ✚ ✘ ✗ ✡ ☛ ✢ ✚ ✑ ✄ ✚ ✒ ✗ ✑ � ✚ ✜ ✘ ✘ WoLF WoLF-IGA (Bowling & Veloso, 2002, 2003) ✌✎✍ ✡☞☛ Modify gradient ascent learning to converge. ✌✎✍ Vary the speed of learning: Win or Learn Fast. – If winning, learn cautiously. – If losing, learn quickly. ☛ ✂✁ Algorithms: WoLF-IGA, WoLF-PHC, GraWoLF . SA3 – D25 SA3 – D26 WoLF-IGA WoLF-IGA — Theorem Theorem. If both players follow WoLF-IGA, where , ✡☞☛ ✌✎✍ and , then their strategies will converge to a Nash equilibrium. ✌✎✍ WoLF Win or Learn Fast! if ✌✎✍ ✌✎✍ ✡☞☛ ✡☞☛ WINNING otherwise LOSING if ✌✎✍ ✌✎✍ WINNING otherwise LOSING SA3 – D27 SA3 – D28

Outline Algorithms for Multiagent Learning A. Introduction - PowerPoint PPT Presentation

Outline Algorithms for Multiagent Learning A. Introduction Equilibrium Learners B. Single Agent Learning Regret Minimizing Algorithms C. Game Theory Best Response Learners Q-Learning

CHAPTER 11: MULTIAGENT INTERACTIONS An Introduction to Multiagent Systems

CHAPTER 6: MULTIAGENT INTERACTIONS An Introduction to Multiagent Systems

LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to Multiagent Systems

LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to MultiAgent Systems

Multiagent Systems: Spring 2006 Ulle Endriss Institute for Logic, Language and Computation

CHAPTER 12: LOGICS FOR MULTIAGENT SYSTEMS An Introduction to Multiagent Systems

and Applications Lecture 13: Programming Multiagent Systems [Part 2] Juan Carlos Nieves Snchez

A MultiAgent System for A MultiAgent System for Retrieving Bioinformatics Retrieving

Multiagent Systems: Spring 2006 Ulle Endriss Institute for Logic, Language and Computation

1. Introduction ( (to Agents and Multiagent g g Systems) ems (SMA-UPC) Javier

1. Introduction (to Agents and Multiagent ( g g D) ems Design (MASD Systems) Javier

Multiagent Resource Allocation: What to optimise, how, and why? Ulle Endriss Imperial College

Agents and Artifacts: The A&A Meta-model for Multiagent Systems Multiagent Systems LS

Multiagent System-based Verification of Security and Privacy Ioana Boureanu Imperial College

Multiagent Systems: Rational Decision Making and Negotiation Ulle Endriss ( ue@doc.ic.ac.uk )

Multiagent Systems: Spring 2006 Ulle Endriss Institute for Logic, Language and Computation

IsoGlib: an Isogeometric Analysis library for the solution of high-order Partial Differential

Learning Multiple Tasks with Boosted Decision Trees Jean Baptiste Faddoul, Boris Chidlovskii,

Internet Governance in July & August 2015 25 August 2015 2 3 Major events in July &

The Shepard Tone and Higher-Order Multi-rate Synchronous Data-Flow Programming in S IG

IGA Lecture III: Twisted Spin c structures Eckhard Meinrenken Adelaide, September 7, 2011

Isogeometric Analysis for high-order two-point singularly perturbed problems of

D15 Facilities - Part 2 Feasibility Study Findings & Facility Master Planning November 13,

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

Outline Algorithms for Multiagent Learning A. Introduction - PowerPoint PPT Presentation

Outline Algorithms for Multiagent Learning A. Introduction Equilibrium Learners B. Single Agent Learning Regret Minimizing Algorithms C. Game Theory Best Response Learners Q-Learning

CHAPTER 11: MULTIAGENT INTERACTIONS An Introduction to Multiagent Systems

CHAPTER 6: MULTIAGENT INTERACTIONS An Introduction to Multiagent Systems

LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to Multiagent Systems

LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to MultiAgent Systems

Multiagent Systems: Spring 2006 Ulle Endriss Institute for Logic, Language and Computation

CHAPTER 12: LOGICS FOR MULTIAGENT SYSTEMS An Introduction to Multiagent Systems

and Applications Lecture 13: Programming Multiagent Systems [Part 2] Juan Carlos Nieves Snchez

A MultiAgent System for A MultiAgent System for Retrieving Bioinformatics Retrieving

Multiagent Systems: Spring 2006 Ulle Endriss Institute for Logic, Language and Computation

1. Introduction ( (to Agents and Multiagent g g Systems) ems (SMA-UPC) Javier

1. Introduction (to Agents and Multiagent ( g g D) ems Design (MASD Systems) Javier

Multiagent Resource Allocation: What to optimise, how, and why? Ulle Endriss Imperial College

Agents and Artifacts: The A&amp;A Meta-model for Multiagent Systems Multiagent Systems LS

Multiagent System-based Verification of Security and Privacy Ioana Boureanu Imperial College

Multiagent Systems: Rational Decision Making and Negotiation Ulle Endriss ( ue@doc.ic.ac.uk )

Multiagent Systems: Spring 2006 Ulle Endriss Institute for Logic, Language and Computation

IsoGlib: an Isogeometric Analysis library for the solution of high-order Partial Differential

Learning Multiple Tasks with Boosted Decision Trees Jean Baptiste Faddoul, Boris Chidlovskii,

Internet Governance in July &amp; August 2015 25 August 2015 2 3 Major events in July &amp;

The Shepard Tone and Higher-Order Multi-rate Synchronous Data-Flow Programming in S IG

IGA Lecture III: Twisted Spin c structures Eckhard Meinrenken Adelaide, September 7, 2011

Isogeometric Analysis for high-order two-point singularly perturbed problems of

D15 Facilities - Part 2 Feasibility Study Findings &amp; Facility Master Planning November 13,

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

Agents and Artifacts: The A&A Meta-model for Multiagent Systems Multiagent Systems LS

Internet Governance in July & August 2015 25 August 2015 2 3 Major events in July &

D15 Facilities - Part 2 Feasibility Study Findings & Facility Master Planning November 13,