multiagent evaluation under incomplete information
play

Multiagent Evaluation under Incomplete Information Mark Rowland * , - PowerPoint PPT Presentation

Multiagent Evaluation under Incomplete Information Mark Rowland * , Shayegan Omidshafiei * , Karl Tuyls, Julien Prolat, Michal Valko, Georgios Piliouras , Rmi Munos * Equal contributors Singapore University of Technology and Design


  1. Multiagent Evaluation under Incomplete Information Mark Rowland * , Shayegan Omidshafiei * , Karl Tuyls, Julien Pérolat, Michal Valko, Georgios Piliouras † , Rémi Munos * Equal contributors † Singapore University of Technology and Design

  2. Motivation ● Problem of interest: ○ Multiagent evaluation under incomplete information 3 ○ Agent evaluation >2-player, general-sum games with noisy payoffs Algorithm Estimated Estimated ranking vector payofg table 2 ● Prototypical application: multiagent iterative training Meta-game 1 synthesis Game simulation Training Train agents via simulations in the underlying game 1 Playing Construct meta-game comparing performance of all 2 agent match-ups Evaluate (i.e., rank or score) agents in the meta-game 3

  3. Motivation ● Problem of interest: ○ Multiagent evaluation under incomplete information 3 ○ Agent evaluation >2-player, general-sum games with noisy payoffs Algorithm Estimated Estimated ranking vector payofg table 2 ● Prototypical application: multiagent iterative training Meta-game 1 synthesis Game simulation Training Train agents via simulations in the underlying game 1 Playing Construct meta-game comparing performance of all 2 agent match-ups Evaluate (i.e., rank or score) agents in the meta-game 3

  4. Multiagent Evaluation at a Glance 𝜷 -Rank Overview 1. Construct response graph capturing player-wise evolutionary deviations: graph over the pure strategy profiles, with directed edges if deviating player’s new strategy is a better-response (U,L) (U,C) (U,R) Player 2 L C R U 2, 1 1, 2 0, 0 (M,L) (M,C) (M,R) M 1, 2 2, 1 1, 0 Player 1 (D,L) (D,C) (D,R) D 0, 0 0, 1 2, 2

  5. Multiagent Evaluation at a Glance 𝜷 -Rank Overview 1. Construct response graph capturing player-wise evolutionary deviations: graph over the pure strategy profiles, with directed edges if deviating player’s new strategy is a better-response (U,L) (U,C) (U,R) Player 2 L C R U 2, 1 1, 2 0, 0 (M,L) (M,C) (M,R) M 1, 2 2, 1 1, 0 Player 1 (D,L) (D,C) (D,R) D 0, 0 0, 1 2, 2

  6. Multiagent Evaluation at a Glance 𝜷 -Rank Overview 1. Construct response graph capturing player-wise evolutionary deviations: graph over the pure strategy profiles, with directed edges if deviating player’s new strategy is a better-response (U,L) (U,C) (U,R) Player 2 L C R U 2, 1 1, 2 0, 0 (M,L) (M,C) (M,R) M 1, 2 2, 1 1, 0 Player 1 (D,L) (D,C) (D,R) D 0, 0 0, 1 2, 2 2. Perturb the response graph → evolutionary mutations ensuring a unique stationary distribution 3. Stationary distribution masses → 𝜷 -Rank

  7. Multiagent Evaluation at a Glance 𝜷 -Rank Overview 1. Construct response graph capturing player-wise evolutionary deviations: graph over the pure strategy profiles, with directed edges if deviating player’s new strategy is a better-response (U,L) (U,C) (U,R) Player 2 L C R U 2, [1,2] 1, [1,2] 0, 0 (M,L) (M,C) (M,R) M 1, 2 2, 1 1, 0 Player 1 (D,L) (D,C) (D,R) D 0, 0 0, 1 2, 2 2. Perturb the response graph → evolutionary mutations ensuring a unique stationary distribution 3. Stationary distribution masses → 𝜷 -Rank

  8. From Uncertainty in Payofgs to Rankings ● Key question: given confidence bounds on the payoff table entries, can we efficiently compute a range of plausible 𝜷 -Rank weights for the agents?

  9. From Uncertainty in Payofgs to Rankings ● Key question: given confidence bounds on the payoff table entries, can we efficiently compute a range of plausible 𝜷 -Rank weights for the agents? Top-ranked agent when no payoff uncertainty ● Takeaway: need careful consideration of payoff uncertainties when ranking agents ●

  10. Contributions Static sample complexity bounds quantifying # of interactions needed to confidently rank agents 1 Algorithm that adaptively simulates agent interactions that are most informative for ranking 2 Analysis of the propagation of payoff uncertainty to the final rankings computed 3 Sample complexity guarantees & efficient alg. for bounding rankings given payoff uncertainty ●

  11. Details & evaluations at poster #220!.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend