nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks - - PowerPoint PPT Presentation

nc going beyond marginal policies for multi agent
SMART_READER_LITE
LIVE PREVIEW

nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks - - PowerPoint PPT Presentation

A Cor Cordial dial Sync nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks ECCV 2020 (Spotlight) Unnat Jain 1* , Luca Weihs 2* , Eric Kolve 2 , Ali Farhadi 3 , Svetlana Lazebnik 1 , Aniruddha Kembhavi 2,3 , Alexander Schwing 1


slide-1
SLIDE 1

A Cor Cordial dial Sync nc: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

Unnat Jain1*, Luca Weihs2*, Eric Kolve2, Ali Farhadi3, Svetlana Lazebnik1, Aniruddha Kembhavi2,3, Alexander Schwing1

* Equal contribution by UJ and LW

1 2 3

ECCV 2020 (Spotlight)

Code, data, and pretrained models at: https://unnat.github.io/cordial-sync/

slide-2
SLIDE 2

Continuous coordination task

  • 1. Furniture Moving for embodied agents
slide-3
SLIDE 3

MARL beyond marginal policies

  • 2. Cordial SYNC policies
slide-4
SLIDE 4

Preview of contributions

  • 1. Furniture Moving task
  • 2. Decentralized MARL

beyond marginal policies

slide-5
SLIDE 5

FurnMove Task

FurnLift Task

Jain* and Weihs* et al. “Two Body Problem: Collaborative Visual Task Completion” in CVPR 2019

slide-6
SLIDE 6

FurnMove Task

slide-7
SLIDE 7

Centralized MARL

slide-8
SLIDE 8

Centralized MARL

Expressive but introduces issues:

Joint policy and model complexity scale exponentially Require high-bandwidth communication channel

slide-9
SLIDE 9

Decentralized MARL

slide-10
SLIDE 10

Decentralized MARL

slide-11
SLIDE 11

Decentralized MARL

Previous methods: Single marginal policy per agent Rank-1

slide-12
SLIDE 12

Represent marginal policies and sample independently

Effective Joint Policy Π = #!⊗ #" =

Marginal Agents Central Agent

Represent and sample from the joint policy

Π∗ = Rank 1

One policy per agent (rank-1)

L1 error 0.29 0.43 0.06 0.03 0.05 0.14 0.32 0.68 0.72 0.06 0.08 0.14 0.26 0.23 0.49 0.02 0.04 0.03 0.05 0.04 0.1 Rank 2 #! #" Agent 1 → Agent 2 →

slide-13
SLIDE 13

Many policies per agent (high-rank)

Age gent nt 1 1 Pol

  • licies

Age gent nt 2 2 Pol

  • licies

#!

"

#"

"

Mi Mixture-of

  • f-Ma

Margi ginals !

&'( )

"& ⋅ (%&

( ⊗ %& )) =

"( ⋅ (%(

( ⊗ %( ))

!!

1 0.4 0.6 0.2

!"

0.8

Mi Mixtur ure weight ghts

0.3 0.7 0.9 0.1 #!

!

#"

!

+ ") ⋅ (%)

( ⊗ %) ))

=

0.29 0.43 0.06 0.03 0.05 0.14

0.29 0.43 0.06 0.03 0.05 0.14

slide-14
SLIDE 14

SYNC-Policies

Marginal agents

slide-15
SLIDE 15

SYNC-Policies

Mixture head

slide-16
SLIDE 16

SYNC-Policies

Generate m policies per agent

slide-17
SLIDE 17

SYNC-Policies

Use communication symbols

slide-18
SLIDE 18

SYNC-Policies

Generate mixture weights

slide-19
SLIDE 19

SYNC-Policies

Synchronized sampling

slide-20
SLIDE 20

SYNC-Policies

Select the same policy j across agents High-Rank

slide-21
SLIDE 21

FurnMove Task

slide-22
SLIDE 22

FurnMove Task

Agents must

  • Remain near the TV
  • Move the TV together
slide-23
SLIDE 23

FurnMove Task

slide-24
SLIDE 24

FurnMove Task

slide-25
SLIDE 25

156/169 ≈ 92.3% of action pairs will always fail.

Single-Agent Navigation MoveAhead RotateLeft RotateRight Pass MoveWithObject MWO MWOAhead MWORight MWOLeft MWOBack MoveObject MO MOAhead MORight MOLeft MOBack

RotateObject Right

Action Space

(Details in the paper)

slide-26
SLIDE 26

Trajectories:

  • Agent 1 trajectory in red
  • Agent 2 trajectory in green
  • TV trajectory in blue

Top-down view

Field of view:

Triangles denote field of view & orientation

TV

Qualitative runs

Goal

slide-27
SLIDE 27

Marginal Agents

Age gent nt 1’s 1’s view view Age gent nt 2’s 2’s view view Top

  • p-dow

down n vie iew (Not available to agents)

slide-28
SLIDE 28

Cordial SYNC Agents

Age gent nt 1’s 1’s view view Age gent nt 2’s 2’s view view Top

  • p-dow

down n vie iew (Not available to agents)

slide-29
SLIDE 29

Cordial SYNC Agents

Top

  • p-dow

down n vie iew (Not available to agents) Age gent nt 1’s 1’s view view Age gent nt 2’s 2’s view view

slide-30
SLIDE 30

Quantitative results

Cordial SYNC agents trains as well as the Central agents Marginal agents train poorly and worsens without comm. Generalize well (with scope for improvement)

slide-31
SLIDE 31

Summary

slide-32
SLIDE 32

Summary

  • 1. Rank-1 restriction of marginal

agents

Marginal Agents

Effective Joint Policy Π = #!⊗ #" = Rank 1 L1 error 0.26 0.23 0.49 0.02 0.04 0.03 0.05 0.04 0.1

slide-33
SLIDE 33

Summary

  • 1. Rank-1 restriction of marginal

agents

  • 2. Mixture-of-marginals

Mi Mixture-of

  • f-Ma

Margi ginals ! "! ⋅ (%!

" ⊗ %! #) # !$"

= "" ⋅ (%"

" ⊗ %" #)

+ "# ⋅ (%#

" ⊗ %# #)

=

0.29 0.43 0.06 0.03 0.05 0.14

slide-34
SLIDE 34

Summary

  • 1. Rank-1 restriction of marginal

agents

  • 2. Mixture-of-marginals
  • 3. SYNC policies
slide-35
SLIDE 35

Summary

  • 1. Rank-1 restriction of marginal

agents

  • 2. Mixture-of-marginals
  • 3. SYNC policies
  • 4. FurnMove task
slide-36
SLIDE 36

Summary

  • 1. Rank-1 restriction of marginal

agents

  • 2. Mixture-of-marginals
  • 3. SYNC policies
  • 4. FurnMove task
  • 5. Qualitative results
slide-37
SLIDE 37

Interpreting Communication Joint Policy Visualizations Mirrored Gridworld Agents Detailed evaluation

A Cor

  • rdial Sync:

Going Beyond Marginal Policies for Multi-Agent Embodied Tasks https://unnat.github.io/cordial-sync/

Agent1 or Agent2 attempted a MoveWithObject action Agent1 or Agent2 took a pass action Reply weights

Steps in episode →

  • d. Communication analysis

Cordial SYNC Marginal (prior)

Joi

  • in ou
  • ur live QA
  • r
  • r zoom
  • om session
  • ns