D++: Structural Credit Assignment in Tightly Coupled Multiagent - - PowerPoint PPT Presentation

d structural credit assignment in tightly coupled
SMART_READER_LITE
LIVE PREVIEW

D++: Structural Credit Assignment in Tightly Coupled Multiagent - - PowerPoint PPT Presentation

D++: Structural Credit Assignment in Tightly Coupled Multiagent Domains Aida Rahmatualabi , Jen Jen Chung, Kagan Tumer Autonomous Agents and Distributed Intelligence Lab OSU Robotjcs Problem Definition team team performance performance


slide-1
SLIDE 1

D++: Structural Credit Assignment in Tightly Coupled Multiagent Domains

Aida Rahmatualabi, Jen Jen Chung, Kagan Tumer Autonomous Agents and Distributed Intelligence Lab OSU Robotjcs

slide-2
SLIDE 2

Problem Definition

2 DEMUR 2016 Aida Rahmattalabi | Oregon State University

team performance team performance

slide-3
SLIDE 3

Loosely Coupled vs Tightly Coupled Agents

Loose coupling :

  • Task consists of many single-robot tasks
  • Each robot uses/requires litule knowledge of the other robots to

accomplish the task Tight coupling :

  • Multjple robots are required to achieve the task
  • Mutual dependence of the robots on each other's performance
  • The objectjve functjon is inherently non-smooth

3 DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-4
SLIDE 4

Learning is Challenging in Tightly Coupled Tasks:

4

DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-5
SLIDE 5

Learning is Challenging in Tightly Coupled Tasks:

The probability of SUFFICIENT agents,

5

DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-6
SLIDE 6

Learning is Challenging in Tightly Coupled Tasks:

The probability of SUFFICIENT agents, picking the RIGHT ACTION

6

DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-7
SLIDE 7

Learning is Challenging in Tightly Coupled Tasks:

The probability of SUFFICIENT agents, picking the RIGHT ACTION, at the RIGHT TIME

7

DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-8
SLIDE 8

Learning is Challenging in Tightly Coupled Tasks:

The probability of SUFFICIENT agents, picking the RIGHT ACTION, at the RIGHT TIME is LOW

8

DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-9
SLIDE 9

Learning is Challenging in Tightly Coupled Tasks:

The probability of SUFFICIENT agents, picking the RIGHT ACTION, at the RIGHT TIME is LOW How can we devise agent-specifjc evaluatjon functjons to reward the stepping stone actjons?

9

DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-10
SLIDE 10

Difference Evaluation Function (Agogino and Tumer, 2004)

– Individual agents’ contributjon to the global team performance – Removes an agent replaces a “counterfactual” agent

11 DEMUR 2016 Aida Rahmattalabi | Oregon State University

Global system performance “The world with me” Global system performance excluding the efgects of agent i “The world without me”

slide-11
SLIDE 11

D++: An Extension to Difference Reward (D)

– The reward functjon evaluates the performance of a “super agent” – It introduces “counterfactual” agents – Provides agents with stronger feedback signal – Rewards the stepping stones that lead to achieving the system objectjve

12 DEMUR 2016 Aida Rahmattalabi | Oregon State University

Global system performance Where “multjple copies of me” are present Global system performance

slide-12
SLIDE 12

Example:

13 DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-13
SLIDE 13

D++: An Extension to Difference Reward(D)

  • How many “counterfactual” agents should be added?

14

Aida Rahmattalabi | Oregon State University DEMUR 2016

slide-14
SLIDE 14

D++: An Extension to Difference Reward(D)

  • How many “counterfactual” agents should be added?

Search difgerent number of counterfactual agents untjl a non zero reward is reached

15

Aida Rahmattalabi | Oregon State University DEMUR 2016

slide-15
SLIDE 15

D++: An Extension to Difference Reward(D)

  • How many “counterfactual” agents should be added?

Search difgerent number of counterfactual agents untjl a non zero reward is reached

  • What if suffjcient number of agents are already available? Is D++ enough?

16

Aida Rahmattalabi | Oregon State University DEMUR 2016

slide-16
SLIDE 16

D++: An Extension to Difference Reward(D)

  • How many “counterfactual” agents should be added?

Search difgerent number of counterfactual agents untjl a non zero reward is reached

  • What if suffjcient number of agents are already available? Is D++ enough?

Calculatjng both D and D++ and choosing the highest one

17

Aida Rahmattalabi | Oregon State University DEMUR 2016

slide-17
SLIDE 17

Cooperative CoEvolutionary Algorithm (CCEA)

  • Train NN policy weights via cooperatjve coevolutjonary algorithm (CCEA)

18 DEMUR 2016 Aida Rahmattalabi | Oregon State University

Initjalize M populatjons of k NNs Initjalize M populatjons of k NNs Mutate each to create M populatjons of 2k NNs Mutate each to create M populatjons of 2k NNs Randomly select one NN from each populatjon to create team Ti Randomly select one NN from each populatjon to create team Ti Assess team performance and assign fjtness to team members Assess team performance and assign fjtness to team members Retain k best performing NNs of each populatjon Retain k best performing NNs of each populatjon Initjalize M populatjons of k NNs Initjalize M populatjons of k NNs Credit Assignment

slide-18
SLIDE 18

Domain: Multi-robot Exploration

  • Neural-network controllers

– NN state vector – Control actjons

  • Team observatjon reward:

19 DEMUR 2016 Aida Rahmattalabi | Oregon State University

s

1,s 2

[ ]

s

1,q,i =

Vj d Lj, Li

( )

jÎ Iq

å

, s

2,q,i =

1 d Li', Li

( )

i'Î Nq

å

dx ,dy

[ ]

G = V

iNi, j 1 Ni,k 2

1 2 (d

i, j +d i,k) k

å

j

å

i

å

slide-19
SLIDE 19

Experiments:

20 DEMUR 2016 Aida Rahmattalabi | Oregon State University

Number of robots Number of POIs Type Required

  • bservatjons

12 10 Homogeneous 3 12 10 Homogeneous 6 9 15 Heterogeneous [1,1,1] 9 15 Heterogeneous [3,1,1]

slide-20
SLIDE 20

Homogeneous Agents: Number of observations = 3

21 DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-21
SLIDE 21

Homogeneous Agents: Number of observations = 3

22 DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-22
SLIDE 22

Homogeneous Agents: Learned Policies of D++ learners

23 DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-23
SLIDE 23

Homogeneous Agents: Learned Policies of D++ learners

24 DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-24
SLIDE 24

Homogeneous Agents: Learned Policies of D++ learners

25 DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-25
SLIDE 25

Homogeneous Agents: Learned Policies of D++ learners

26 DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-26
SLIDE 26

Homogeneous Agents: Number of observations = 6

27 DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-27
SLIDE 27

Homogeneous Agents: Number of observations = 6

28 DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-28
SLIDE 28

Heterogeneous Agents: Number of observations = [1, 1, 1]

Calls to G

1000 2000 3000 4000 5000 6000 7000 8000

G(z)

10 20 30 40 50

G D D++

29 DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-29
SLIDE 29

Heterogeneous Agents: Learned Policies of D++ learners

X

  • 5

5 10 15 20 25 30 35 40 Y 5 10 15 20 25 30 35 40

30 DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-30
SLIDE 30

Heterogeneous Agents: Number of observations = [3, 1, 1]

Calls to G

1000 2000 3000 4000 5000 6000 7000 8000

G(z)

2 4 6 8 10 12 14

G D D++

31 DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-31
SLIDE 31

Conclusion

  • D++ is a new rewarding structure for tjghtly coupled multjagent domains
  • D++ outperforms both G and D

– Rewarding the stepping stone actjons required in the long term success

  • Robot heterogeneity/tjghter coupling challenges G and D learners

– D++ learners can learn high-reward policies

32 DEMUR 2016 Aida Rahmattalabi | Oregon State University

slide-32
SLIDE 32

D++: Structural Credit Assignment in Tightly Coupled Multiagent Domains

Aida Rahmatualabi, Jen Jen Chung, Kagan Tumer Autonomous Agents and Distributed Intelligence Lab OSU Robotjcs