Multiagent Evaluation under Incomplete Information Mark Rowland * , - - PowerPoint PPT Presentation

multiagent evaluation under incomplete information
SMART_READER_LITE
LIVE PREVIEW

Multiagent Evaluation under Incomplete Information Mark Rowland * , - - PowerPoint PPT Presentation

Multiagent Evaluation under Incomplete Information Mark Rowland * , Shayegan Omidshafiei * , Karl Tuyls, Julien Prolat, Michal Valko, Georgios Piliouras , Rmi Munos * Equal contributors Singapore University of Technology and Design


slide-1
SLIDE 1

Multiagent Evaluation under Incomplete Information

Mark Rowland*, Shayegan Omidshafiei*, Karl Tuyls, Julien Pérolat, Michal Valko, Georgios Piliouras†, Rémi Munos

*Equal contributors †Singapore University of Technology and Design

slide-2
SLIDE 2
  • Problem of interest:

Multiagent evaluation under incomplete information

>2-player, general-sum games with noisy payoffs

Motivation

Agent evaluation Algorithm

Estimated ranking vector

Training Playing Meta-game synthesis Game simulation

  • Prototypical application: multiagent iterative training

Train agents via simulations in the underlying game Construct meta-game comparing performance of all agent match-ups Evaluate (i.e., rank or score) agents in the meta-game 1 2 3 1 2 3

Estimated payofg table

slide-3
SLIDE 3
  • Problem of interest:

Multiagent evaluation under incomplete information

>2-player, general-sum games with noisy payoffs

Motivation

Agent evaluation Algorithm

Estimated ranking vector

Training Playing Meta-game synthesis Game simulation

  • Prototypical application: multiagent iterative training

Train agents via simulations in the underlying game Construct meta-game comparing performance of all agent match-ups Evaluate (i.e., rank or score) agents in the meta-game 1 2 3 1 2 3

Estimated payofg table

slide-4
SLIDE 4

1. Construct response graph capturing player-wise evolutionary deviations: graph over the pure strategy profiles, with directed edges if deviating player’s new strategy is a better-response

Multiagent Evaluation at a Glance

𝜷-Rank Overview

Player 1 Player 2

(U,R) (D,C) (D,L) (D,R) (M,R) (M,L) (M,C) (U,C) (U,L)

L C R U 2, 1 1, 2 0, 0 M 1, 2 2, 1 1, 0 D 0, 0 0, 1 2, 2

slide-5
SLIDE 5

1. Construct response graph capturing player-wise evolutionary deviations: graph over the pure strategy profiles, with directed edges if deviating player’s new strategy is a better-response

Multiagent Evaluation at a Glance

𝜷-Rank Overview

Player 1 Player 2

(U,R) (D,C) (D,L) (D,R) (M,C) (U,L) (U,C) (M,R) (M,L)

L C R U 2, 1 1, 2 0, 0 M 1, 2 2, 1 1, 0 D 0, 0 0, 1 2, 2

slide-6
SLIDE 6

1. Construct response graph capturing player-wise evolutionary deviations: graph over the pure strategy profiles, with directed edges if deviating player’s new strategy is a better-response

Multiagent Evaluation at a Glance

𝜷-Rank Overview

Player 1 Player 2

(U,R) (D,C) (D,L) (D,R) (M,C) (U,L) (U,C) (M,R) (M,L)

2. Perturb the response graph → evolutionary mutations ensuring a unique stationary distribution 3. Stationary distribution masses → 𝜷-Rank

L C R U 2, 1 1, 2 0, 0 M 1, 2 2, 1 1, 0 D 0, 0 0, 1 2, 2

slide-7
SLIDE 7

1. Construct response graph capturing player-wise evolutionary deviations: graph over the pure strategy profiles, with directed edges if deviating player’s new strategy is a better-response

Multiagent Evaluation at a Glance

𝜷-Rank Overview

Player 1 Player 2 L C R U 2, [1,2] 1, [1,2] 0, 0 M 1, 2 2, 1 1, 0 D 0, 0 0, 1 2, 2

(U,R) (D,C) (D,L) (D,R) (M,C) (U,L) (U,C) (M,R) (M,L)

2. Perturb the response graph → evolutionary mutations ensuring a unique stationary distribution 3. Stationary distribution masses → 𝜷-Rank

slide-8
SLIDE 8

From Uncertainty in Payofgs to Rankings

  • Key question: given confidence bounds on the payoff table entries, can we efficiently compute

a range of plausible 𝜷-Rank weights for the agents?

slide-9
SLIDE 9

From Uncertainty in Payofgs to Rankings

  • Key question: given confidence bounds on the payoff table entries, can we efficiently compute

a range of plausible 𝜷-Rank weights for the agents?

  • Top-ranked agent when no payoff uncertainty
  • Takeaway: need careful consideration of payoff uncertainties when ranking agents
slide-10
SLIDE 10

Contributions

Static sample complexity bounds quantifying # of interactions needed to confidently rank agents 1 2 Algorithm that adaptively simulates agent interactions that are most informative for ranking 3 Analysis of the propagation of payoff uncertainty to the final rankings computed

  • Sample complexity guarantees & efficient alg. for bounding rankings given payoff uncertainty
slide-11
SLIDE 11

Details & evaluations at poster #220!.