ROMA: Multi-Agent Reinforcement Learning with Emerging Roles - - PowerPoint PPT Presentation

roma multi agent reinforcement learning with emerging
SMART_READER_LITE
LIVE PREVIEW

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles - - PowerPoint PPT Presentation

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor Lesser, Chongjie Zhang Tsinghua University, UMass Amherst Multi-Agent Systems Robot Football Game Multi-Agent Assembly One Major Challenge of


slide-1
SLIDE 1

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles

Tonghan Wang, Heng Dong, Victor Lesser, Chongjie Zhang Tsinghua University, UMass Amherst

slide-2
SLIDE 2

Multi-Agent Systems

Multi-Agent Assembly Robot Football Game

slide-3
SLIDE 3

One Major Challenge of Achieving Efficient MARL

  • Exponential blow-up of the state-action space

– The state-action space grows exponentially with the number of agents. – Learning a centralized strategy is not scalable.

  • Solution:

– Learning decentralized value functions or policies.

slide-4
SLIDE 4

Decentralized Learning

  • Separate learning

– High learning complexity: Some agents are performing similar tasks from time to time;

  • Shared learning

– Share decentralized policies or value functions; – Adopted by most algorithms; – Can accelerate training.

slide-5
SLIDE 5

Drawbacks of Shared Learning

  • Parameter sharing

– Use a single policy to solve a task. – Inefficient in complex tasks. (Adam Smith’s pin factory.)

  • An important direction of MARL

– Complex multi-agent cooperation needs sub-task specialization. – Dynamic learning sharing among agents responsible for the same sub-task.

slide-6
SLIDE 6

Draw Some Inspirations from Natural Systems

  • Ants

– Division of labor

  • Humans

– Share experience among people with the same vocation.

slide-7
SLIDE 7

Role-Based Multi-Agent Systems

  • Previous work

– The complexity of agent design is reduced via task decomposition. – Predefine roles and associated responsibilities made up of a set of sub- tasks.

  • ROMA

– Incorporate role learning into multi-agent reinforcement learning.

slide-8
SLIDE 8
  • 1. Motivation
  • 2. Method
  • 3. Results and Discussion

Outline

slide-9
SLIDE 9

Our Idea

  • Learn sub-task specialization.
  • Let agents responsible for similar sub-tasks have similar policies

and share their learning.

  • Introduce roles.

Policies Sub-Task Specialization Roles

slide-10
SLIDE 10

Our method

  • Connection between roles and policies

– Generating role embeddings by a role encoder conditioned on local

  • bservations;

– Conditioning agents’ policies on individual roles.

  • Connection between roles and behaviors

– We propose two regularizers to enable roles to be:

  • Identifiable by behaviors
  • Specialized in certain sub-tasks
slide-11
SLIDE 11

Identifiable Roles

  • We propose a regularizer to maximize ! "#; %# &#
  • A lower bound:

!(%#

(; "# ()*|&# () ≥ ./0

1,30 145,60 1[log ;<(%#

(|"# ()*, &# ()

=(%#

(|&# ()

]

  • In practice, we optimize

ℒ@ A/, B = . 30

145,60 1 ~E Fℰ = %#

( &# ( H;< %# ( "# ()*, &# (

− J(%#

(|&# ()

slide-12
SLIDE 12
  • We expect that, for any two agents,

– Either they have similar roles; – Or they have different behaviors, which are characterized by the local

  • bservation-action history.
  • However

– Which agents have similar roles? – How to measure the dissimilarity between agents’ behaviors?

Specialized Roles

slide-13
SLIDE 13
  • To solve this problem, we

– Introduce a learnable dissimilarity model !" – For each pair of agents, # and $, seek to maximize % &'; )* +

' + !"(&*, &')

– Seek to minimize 0"

1,2, the number of non-zero elements in 0" =

!*' , where !*' = !"(&*, &')

Specialized Roles

slide-14
SLIDE 14
  • Formally, we propose the following role embedding learning problem

to encourage sub-task specialization:

minimize &', ), * +,

  • .,/

subject to 7 89

  • ; ;<
  • => ?9
  • + A, ;9
  • =>, ;<
  • => > C,

∀E ≠ G

  • The specialization loss:

ℒI &', ), * = K LMNO,PM ~I,RM~'(T|PM) +,

  • W− X

9Y<

min{[\(89

  • |;<
  • =>, ?9
  • ) + A, ;9
  • =>, ;<
  • => , C}

Specialized Roles

slide-15
SLIDE 15

Overall Optimization Objective

  • Overall Optimization Objective

– ℒ " = ℒ$% " + '(ℒ( "), + + '%ℒ% "), +, ,

slide-16
SLIDE 16
  • 1. Motivation
  • 2. Methods
  • 3. Results and Discussion

Outline

slide-17
SLIDE 17

State-of-the-art performance on the SMAC benchmark

slide-18
SLIDE 18

The SMAC Challenge

slide-19
SLIDE 19

Ablation Study

slide-20
SLIDE 20

Ablation Study

slide-21
SLIDE 21

Role Representations

slide-22
SLIDE 22

Dynamic Roles

! = 1 ! = 8 ! = 19 ! = 27

slide-23
SLIDE 23

Specialized Roles

  • Learnable dissimilarity model:

– Map: MMM2; – Different unit types have different roles; – Learned dissimilarity between trajectories of different unit types: 0.9556 ± 0.0009; – Learned dissimilarity between trajectories of the same unit type: 0.0780 ± 0.0019.

slide-24
SLIDE 24

Specialized Roles

slide-25
SLIDE 25

Multi-Agent Reinforcement Learning with Emerging Roles

slide-26
SLIDE 26

Role Emergence

slide-27
SLIDE 27

Role Emergence

slide-28
SLIDE 28

Game Replays

slide-29
SLIDE 29

27m_vs_30m (27 Marines vs. 30 Marines)

slide-30
SLIDE 30

For more experimental results. Welcome to our website:

  • https://sites.google.com/view/romarl