SLIDE 1
D++: Structural Credit Assignment in Tightly Coupled Multiagent - - PowerPoint PPT Presentation
D++: Structural Credit Assignment in Tightly Coupled Multiagent - - PowerPoint PPT Presentation
D++: Structural Credit Assignment in Tightly Coupled Multiagent Domains Aida Rahmatualabi , Jen Jen Chung, Kagan Tumer Autonomous Agents and Distributed Intelligence Lab OSU Robotjcs Problem Definition team team performance performance
SLIDE 2
SLIDE 3
Loosely Coupled vs Tightly Coupled Agents
Loose coupling :
- Task consists of many single-robot tasks
- Each robot uses/requires litule knowledge of the other robots to
accomplish the task Tight coupling :
- Multjple robots are required to achieve the task
- Mutual dependence of the robots on each other's performance
- The objectjve functjon is inherently non-smooth
3 DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 4
Learning is Challenging in Tightly Coupled Tasks:
4
DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 5
Learning is Challenging in Tightly Coupled Tasks:
The probability of SUFFICIENT agents,
5
DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 6
Learning is Challenging in Tightly Coupled Tasks:
The probability of SUFFICIENT agents, picking the RIGHT ACTION
6
DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 7
Learning is Challenging in Tightly Coupled Tasks:
The probability of SUFFICIENT agents, picking the RIGHT ACTION, at the RIGHT TIME
7
DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 8
Learning is Challenging in Tightly Coupled Tasks:
The probability of SUFFICIENT agents, picking the RIGHT ACTION, at the RIGHT TIME is LOW
8
DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 9
Learning is Challenging in Tightly Coupled Tasks:
The probability of SUFFICIENT agents, picking the RIGHT ACTION, at the RIGHT TIME is LOW How can we devise agent-specifjc evaluatjon functjons to reward the stepping stone actjons?
9
DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 10
Difference Evaluation Function (Agogino and Tumer, 2004)
– Individual agents’ contributjon to the global team performance – Removes an agent replaces a “counterfactual” agent
11 DEMUR 2016 Aida Rahmattalabi | Oregon State University
Global system performance “The world with me” Global system performance excluding the efgects of agent i “The world without me”
SLIDE 11
D++: An Extension to Difference Reward (D)
– The reward functjon evaluates the performance of a “super agent” – It introduces “counterfactual” agents – Provides agents with stronger feedback signal – Rewards the stepping stones that lead to achieving the system objectjve
12 DEMUR 2016 Aida Rahmattalabi | Oregon State University
Global system performance Where “multjple copies of me” are present Global system performance
SLIDE 12
Example:
13 DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 13
D++: An Extension to Difference Reward(D)
- How many “counterfactual” agents should be added?
14
Aida Rahmattalabi | Oregon State University DEMUR 2016
SLIDE 14
D++: An Extension to Difference Reward(D)
- How many “counterfactual” agents should be added?
Search difgerent number of counterfactual agents untjl a non zero reward is reached
15
Aida Rahmattalabi | Oregon State University DEMUR 2016
SLIDE 15
D++: An Extension to Difference Reward(D)
- How many “counterfactual” agents should be added?
Search difgerent number of counterfactual agents untjl a non zero reward is reached
- What if suffjcient number of agents are already available? Is D++ enough?
16
Aida Rahmattalabi | Oregon State University DEMUR 2016
SLIDE 16
D++: An Extension to Difference Reward(D)
- How many “counterfactual” agents should be added?
Search difgerent number of counterfactual agents untjl a non zero reward is reached
- What if suffjcient number of agents are already available? Is D++ enough?
Calculatjng both D and D++ and choosing the highest one
17
Aida Rahmattalabi | Oregon State University DEMUR 2016
SLIDE 17
Cooperative CoEvolutionary Algorithm (CCEA)
- Train NN policy weights via cooperatjve coevolutjonary algorithm (CCEA)
18 DEMUR 2016 Aida Rahmattalabi | Oregon State University
Initjalize M populatjons of k NNs Initjalize M populatjons of k NNs Mutate each to create M populatjons of 2k NNs Mutate each to create M populatjons of 2k NNs Randomly select one NN from each populatjon to create team Ti Randomly select one NN from each populatjon to create team Ti Assess team performance and assign fjtness to team members Assess team performance and assign fjtness to team members Retain k best performing NNs of each populatjon Retain k best performing NNs of each populatjon Initjalize M populatjons of k NNs Initjalize M populatjons of k NNs Credit Assignment
SLIDE 18
Domain: Multi-robot Exploration
- Neural-network controllers
– NN state vector – Control actjons
- Team observatjon reward:
19 DEMUR 2016 Aida Rahmattalabi | Oregon State University
s
1,s 2
[ ]
s
1,q,i =
Vj d Lj, Li
( )
jÎ Iq
å
, s
2,q,i =
1 d Li', Li
( )
i'Î Nq
å
dx ,dy
[ ]
G = V
iNi, j 1 Ni,k 2
1 2 (d
i, j +d i,k) k
å
j
å
i
å
SLIDE 19
Experiments:
20 DEMUR 2016 Aida Rahmattalabi | Oregon State University
Number of robots Number of POIs Type Required
- bservatjons
12 10 Homogeneous 3 12 10 Homogeneous 6 9 15 Heterogeneous [1,1,1] 9 15 Heterogeneous [3,1,1]
SLIDE 20
Homogeneous Agents: Number of observations = 3
21 DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 21
Homogeneous Agents: Number of observations = 3
22 DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 22
Homogeneous Agents: Learned Policies of D++ learners
23 DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 23
Homogeneous Agents: Learned Policies of D++ learners
24 DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 24
Homogeneous Agents: Learned Policies of D++ learners
25 DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 25
Homogeneous Agents: Learned Policies of D++ learners
26 DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 26
Homogeneous Agents: Number of observations = 6
27 DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 27
Homogeneous Agents: Number of observations = 6
28 DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 28
Heterogeneous Agents: Number of observations = [1, 1, 1]
Calls to G
1000 2000 3000 4000 5000 6000 7000 8000
G(z)
10 20 30 40 50
G D D++
29 DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 29
Heterogeneous Agents: Learned Policies of D++ learners
X
- 5
5 10 15 20 25 30 35 40 Y 5 10 15 20 25 30 35 40
30 DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 30
Heterogeneous Agents: Number of observations = [3, 1, 1]
Calls to G
1000 2000 3000 4000 5000 6000 7000 8000
G(z)
2 4 6 8 10 12 14
G D D++
31 DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 31
Conclusion
- D++ is a new rewarding structure for tjghtly coupled multjagent domains
- D++ outperforms both G and D
– Rewarding the stepping stone actjons required in the long term success
- Robot heterogeneity/tjghter coupling challenges G and D learners
– D++ learners can learn high-reward policies
32 DEMUR 2016 Aida Rahmattalabi | Oregon State University
SLIDE 32