nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks - - PowerPoint PPT Presentation

▶

Jan 19, 2023 522 likes •899 views

A Cor Cordial dial Sync nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks ECCV 2020 (Spotlight) Unnat Jain 1* , Luca Weihs 2* , Eric Kolve 2 , Ali Farhadi 3 , Svetlana Lazebnik 1 , Aniruddha Kembhavi 2,3 , Alexander Schwing 1

SLIDE 1

A Cor Cordial dial Sync nc: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

Unnat Jain1, Luca Weihs2, Eric Kolve2, Ali Farhadi3, Svetlana Lazebnik1, Aniruddha Kembhavi2,3, Alexander Schwing1

* Equal contribution by UJ and LW

1 2 3

ECCV 2020 (Spotlight)

Code, data, and pretrained models at: https://unnat.github.io/cordial-sync/

SLIDE 2

Continuous coordination task

1. Furniture Moving for embodied agents

SLIDE 3

MARL beyond marginal policies

2. Cordial SYNC policies

SLIDE 4

Preview of contributions

1. Furniture Moving task
2. Decentralized MARL

beyond marginal policies

SLIDE 5

FurnMove Task

FurnLift Task

Jain* and Weihs* et al. “Two Body Problem: Collaborative Visual Task Completion” in CVPR 2019

SLIDE 6

FurnMove Task

SLIDE 7

Centralized MARL

SLIDE 8

Centralized MARL

Expressive but introduces issues:

Joint policy and model complexity scale exponentially Require high-bandwidth communication channel

SLIDE 9

Decentralized MARL

SLIDE 10

Decentralized MARL

SLIDE 11

Decentralized MARL

Previous methods: Single marginal policy per agent Rank-1

SLIDE 12

Represent marginal policies and sample independently

Effective Joint Policy Π = #!⊗ #" =

Marginal Agents Central Agent

Represent and sample from the joint policy

Π∗ = Rank 1

One policy per agent (rank-1)

L1 error 0.29 0.43 0.06 0.03 0.05 0.14 0.32 0.68 0.72 0.06 0.08 0.14 0.26 0.23 0.49 0.02 0.04 0.03 0.05 0.04 0.1 Rank 2 #! #" Agent 1 → Agent 2 →

SLIDE 13

Many policies per agent (high-rank)

Age gent nt 1 1 Pol

licies

Age gent nt 2 2 Pol

licies

Mi Mixture-of

f-Ma

Margi ginals !

&'( )

"& ⋅ (%&

( ⊗ %& )) =

"( ⋅ (%(

( ⊗ %( ))

!!

1 0.4 0.6 0.2

!"

0.8

Mi Mixtur ure weight ghts

0.3 0.7 0.9 0.1 #!

+ ") ⋅ (%)

( ⊗ %) ))

=

0.29 0.43 0.06 0.03 0.05 0.14

SLIDE 14

SYNC-Policies

Marginal agents

SLIDE 15

SYNC-Policies

Mixture head

SLIDE 16

SYNC-Policies

Generate m policies per agent

SLIDE 17

SYNC-Policies

Use communication symbols

SLIDE 18

SYNC-Policies

Generate mixture weights

SLIDE 19

SYNC-Policies

Synchronized sampling

SLIDE 20

SYNC-Policies

Select the same policy j across agents High-Rank

SLIDE 21

FurnMove Task

SLIDE 22

FurnMove Task

Agents must

Remain near the TV
Move the TV together

SLIDE 23

FurnMove Task

SLIDE 24

FurnMove Task

SLIDE 25

156/169 ≈ 92.3% of action pairs will always fail.

Single-Agent Navigation MoveAhead RotateLeft RotateRight Pass MoveWithObject MWO MWOAhead MWORight MWOLeft MWOBack MoveObject MO MOAhead MORight MOLeft MOBack

RotateObject Right

Action Space

(Details in the paper)

SLIDE 26

Trajectories:

Agent 1 trajectory in red
Agent 2 trajectory in green
TV trajectory in blue

Top-down view

Field of view:

Triangles denote field of view & orientation

Qualitative runs

Goal

SLIDE 27

Marginal Agents

Age gent nt 1’s 1’s view view Age gent nt 2’s 2’s view view Top

p-dow

down n vie iew (Not available to agents)

SLIDE 28

Cordial SYNC Agents

Age gent nt 1’s 1’s view view Age gent nt 2’s 2’s view view Top

p-dow

down n vie iew (Not available to agents)

SLIDE 29

Cordial SYNC Agents

Top

p-dow

down n vie iew (Not available to agents) Age gent nt 1’s 1’s view view Age gent nt 2’s 2’s view view

SLIDE 30

Quantitative results

Cordial SYNC agents trains as well as the Central agents Marginal agents train poorly and worsens without comm. Generalize well (with scope for improvement)

SLIDE 31

Summary

SLIDE 32

Summary

1. Rank-1 restriction of marginal

agents

Marginal Agents

Effective Joint Policy Π = #!⊗ #" = Rank 1 L1 error 0.26 0.23 0.49 0.02 0.04 0.03 0.05 0.04 0.1

SLIDE 33

Summary

1. Rank-1 restriction of marginal

agents

2. Mixture-of-marginals

Mi Mixture-of

f-Ma

Margi ginals ! "! ⋅ (%!

" ⊗ %! #) # !$"

= "" ⋅ (%"

" ⊗ %" #)

+ "# ⋅ (%#

" ⊗ %# #)

0.29 0.43 0.06 0.03 0.05 0.14

SLIDE 34

Summary

1. Rank-1 restriction of marginal

agents

2. Mixture-of-marginals
3. SYNC policies

SLIDE 35

Summary

1. Rank-1 restriction of marginal

agents

2. Mixture-of-marginals
3. SYNC policies
4. FurnMove task

SLIDE 36

Summary

1. Rank-1 restriction of marginal

agents

2. Mixture-of-marginals
3. SYNC policies
4. FurnMove task
5. Qualitative results

SLIDE 37

Interpreting Communication Joint Policy Visualizations Mirrored Gridworld Agents Detailed evaluation

A Cor

rdial Sync:

Going Beyond Marginal Policies for Multi-Agent Embodied Tasks https://unnat.github.io/cordial-sync/

Agent1 or Agent2 attempted a MoveWithObject action Agent1 or Agent2 took a pass action Reply weights

Steps in episode →

d. Communication analysis

Cordial SYNC Marginal (prior)

Joi

in ou
ur live QA
r
r zoom
om session
ns

A Cor Cordial dial Sync nc: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

Unnat Jain1*, Luca Weihs2*, Eric Kolve2, Ali Farhadi3, Svetlana Lazebnik1, Aniruddha Kembhavi2,3, Alexander Schwing1

ECCV 2020 (Spotlight)

Code, data, and pretrained models at: https://unnat.github.io/cordial-sync/

Continuous coordination task

MARL beyond marginal policies

Preview of contributions

beyond marginal policies

FurnMove Task

FurnLift Task

FurnMove Task

Centralized MARL

Centralized MARL

Expressive but introduces issues:

Joint policy and model complexity scale exponentially Require high-bandwidth communication channel

Decentralized MARL

Decentralized MARL

Decentralized MARL

Previous methods: Single marginal policy per agent Rank-1

Represent marginal policies and sample independently

Marginal Agents Central Agent

Represent and sample from the joint policy

One policy per agent (rank-1)

Many policies per agent (high-rank)

Mi Mixture-of

Margi ginals !

"& ⋅ (%&

"( ⋅ (%(

!!

!"

+ ") ⋅ (%)

=

SYNC-Policies

Marginal agents

SYNC-Policies

Mixture head

SYNC-Policies

Generate m policies per agent

SYNC-Policies

Use communication symbols

SYNC-Policies

Generate mixture weights

SYNC-Policies

Synchronized sampling

SYNC-Policies

Select the same policy j across agents High-Rank

FurnMove Task

FurnMove Task

FurnMove Task

FurnMove Task

156/169 ≈ 92.3% of action pairs will always fail.

Action Space

Trajectories:

Top-down view

Field of view:

Triangles denote field of view & orientation

Qualitative runs

Marginal Agents

Cordial SYNC Agents

Cordial SYNC Agents

Quantitative results

Summary

Summary

agents

Summary

agents

Summary

agents

Summary

agents

Summary

agents

Interpreting Communication Joint Policy Visualizations Mirrored Gridworld Agents Detailed evaluation

A Cor

Going Beyond Marginal Policies for Multi-Agent Embodied Tasks https://unnat.github.io/cordial-sync/

Joi

Unnat Jain1, Luca Weihs2, Eric Kolve2, Ali Farhadi3, Svetlana Lazebnik1, Aniruddha Kembhavi2,3, Alexander Schwing1