Tightly and Loosely Coupled Decision Paradigms in Multiagent Expedition
Yang Xiang & Frank Hanshar University of Guelph Ontario, Canada PGM 2008 September 18, 2008
Tightly and Loosely Coupled Decision Paradigms in Multiagent - - PowerPoint PPT Presentation
Tightly and Loosely Coupled Decision Paradigms in Multiagent Expedition Yang Xiang & Frank Hanshar University of Guelph Ontario, Canada PGM 2008 September 18, 2008 Outline Introduction What is Multiagent Expedition?
Yang Xiang & Frank Hanshar University of Guelph Ontario, Canada PGM 2008 September 18, 2008
2
PGM ‘08
communicate, rely on observing other agents actions to discern state and coordinate with each other
communicate through messages over interfaces that are rigourously defined
3
PGM ‘08
paradigm are poorly understood.
and TCFs for multiagent planning.
from LCFs (RMM) and one from TCFs (CDN).
compare them experimentally on a test problem called multiagent expedition.
4
PGM ‘08
distributed in the environment.
present.
cooperate to maximize team reward.
sea-floor exploration, disaster rescue, ...
5
PGM ‘08
6 (b) (a) (c)
0.025 0.025 0.025 0.9 0.025
0.3 0.4 0.1 0.1 0.2 0.8 0.9 0.4 0.5 0.5 0.3 0.4 0.1 0.2 0.4 0.7 0.9 0.3 0.5 0.9 0.3 0.1 0.4 0.2 0.2 0.7 0.3 0.9 0.7 0.7 0.1 0.4 0.2 0.4 0.1 0.9 0.9 0.5 0.9 0.5 0.1 0.3 0.3 0.2 0.3 0.3 0.7 0.7 0.5 0.6
PGM ‘08
6 (b) (a) (c)
0.025 0.025 0.025 0.9 0.025
0.3 0.4 0.1 0.1 0.2 0.8 0.9 0.4 0.5 0.5 0.3 0.4 0.1 0.2 0.4 0.7 0.9 0.3 0.5 0.9 0.3 0.1 0.4 0.2 0.2 0.7 0.3 0.9 0.7 0.7 0.1 0.4 0.2 0.4 0.1 0.9 0.9 0.5 0.9 0.5 0.1 0.3 0.3 0.2 0.3 0.3 0.7 0.7 0.5 0.6
PGM ‘08
6 (b) (a) (c)
0.025 0.025 0.025 0.9 0.025
0.3 0.4 0.1 0.1 0.2 0.8 0.9 0.4 0.5 0.5 0.3 0.4 0.1 0.2 0.4 0.7 0.9 0.3 0.5 0.9 0.3 0.1 0.4 0.2 0.2 0.7 0.3 0.9 0.7 0.7 0.1 0.4 0.2 0.4 0.1 0.9 0.9 0.5 0.9 0.5 0.1 0.3 0.3 0.2 0.3 0.3 0.7 0.7 0.5 0.6
PGM ‘08
6 (b) (a) (c)
0.025 0.025 0.025 0.9 0.025
0.3 0.4 0.1 0.1 0.2 0.8 0.9 0.4 0.5 0.5 0.3 0.4 0.1 0.2 0.4 0.7 0.9 0.3 0.5 0.9 0.3 0.1 0.4 0.2 0.2 0.7 0.3 0.9 0.7 0.7 0.1 0.4 0.2 0.4 0.1 0.9 0.9 0.5 0.9 0.5 0.1 0.3 0.3 0.2 0.3 0.3 0.7 0.7 0.5 0.6
PGM ‘08
after they are visited.
d = (r1, r2) = (0.1, 0.2) d
7
used.
r1 r2 (r1, r2), r1, r2 ∈ [0, 1]
0.2 0.3 0.1
A
PGM ‘08
after they are visited.
d = (r1, r2) = (0.1, 0.2) d
7
used.
r1 r2 (r1, r2), r1, r2 ∈ [0, 1]
0.2 0.3 0.1
A
0.2 0.1 0.1
A
PGM ‘08
after they are visited.
d = (r1, r2) = (0.1, 0.2) d
7
used.
r1 r2 (r1, r2), r1, r2 ∈ [0, 1]
0.2 0.3 0.1
A
0.2 0.1 0.1
A
0.1 0.1 0.1
A
PGM ‘08
8
A ➙ North B ➙ South A Reward = 0.3 B Reward = 0.3 Total = 0.6 A ➙ North B ➙ West A Reward = 0.7/2=0.35 B Reward = 0.7/2=0.35 Total = 0.7
0.3 0.7 0.3 0.7
A B
0.3 0.7 0.3 0.7
A B
0.3 0.7 0.3 0.7
A B
PGM ‘08
8
A ➙ North B ➙ South A Reward = 0.3 B Reward = 0.3 Total = 0.6 A ➙ North B ➙ West A Reward = 0.7/2=0.35 B Reward = 0.7/2=0.35 Total = 0.7
0.3 0.7 0.3 0.7
A B
0.3 0.7 0.3 0.7
A B
0.3 0.7 0.3 0.7
A B
PGM ‘08
9
independent of the history given the current state and joint action of agents.
and their own local neighbourhood.
evaluate = possible effects.
B C
524 6 2 6 × 1016
PGM ‘08
Expedition
10
PGM ‘08
is sparse.
decision network.
11
PGM ‘08
design
12
G = (V, E) V = D ∪ T ∪ M ∪ U D T M U
PGM ‘08
probability distribution.
design constraint.
maximal.
13
d∗ EU(d∗) P(d|π(d))
PGM ‘08
Bayesian networks to multiagent decision making.
small set of shared variables btw agents.
14
PGM ‘08
15
PGM ‘08
mvx,i psx,1 = (0, 0) psx,1 = (1, 0) psx,1 = (−1, 0) psx,1 = (0, 1) psx,1 = (0, −1) north 0.025 0.025 0.025 0.9 0.025 south 0.025 0.025 0.025 0.025 0.9 east 0.025 0.9 0.025 0.025 0.025 west 0.025 0.025 0.9 0.025 0.025 halt 0.9 0.025 0.025 0.025 0.025
mvx,i psx,1
16
PGM ‘08
17
PGM ‘08
psx,1
A
psx,1
B
psx,1
C
P(rwABC,1 = y|psx,1
A , psx,1 B , psx,1 C )
(0,0) (0,0) (0,0) 0.4 (0,0) (0,0) (1,0) 0.2 (0,0) (0,0) (2,0) 0.1 . . . . . . (-2,-2) (-2,-2) (-2,0) (-2,-2) (-2,-2) (-2,-1) 0.3 (-2,-2) (-2,-2) (-2,-2) 1.0
nodes.
through performance nodes.
find maximal utility design which corresponds to the globally optimal joint plan.
18
PGM ‘08
psA,1 psC,2 psC,2 mvB,2 mvB,1 mvB,1 mvB,2 mvB,1 mvB,2 psB,1 psB,2 psB,1 psB,2 rwAB,1 rwAB,2 rwBC,1 rwBC,2 psA,1 psA,2 psB,1 psB,2 rwBC,2 rwBC,1 rwAB,1 rwAB,2 A B C A B C Ψ mvA,1 mvA,2 psC,1 psC,1 psA,2 mvA,1 mvA,2 mvC,1 mvC,2 mvC,1 mvC,2
19
CDN of a 3 agent group (A,B,C) for expedition/planning, where is the hypertree.
Ψ
PGM ‘08
psA,1 psC,2 psC,2 mvB,2 mvB,1 mvB,1 mvB,2 mvB,1 mvB,2 psB,1 psB,2 psB,1 psB,2 rwAB,1 rwAB,2 rwBC,1 rwBC,2 psA,1 psA,2 psB,1 psB,2 rwBC,2 rwBC,1 rwAB,1 rwAB,2 A B C A B C Ψ mvA,1 mvA,2 psC,1 psC,1 psA,2 mvA,1 mvA,2 mvC,1 mvC,2 mvC,1 mvC,2
Design
19
CDN of a 3 agent group (A,B,C) for expedition/planning, where is the hypertree.
Ψ
PGM ‘08
psA,1 psC,2 psC,2 mvB,2 mvB,1 mvB,1 mvB,2 mvB,1 mvB,2 psB,1 psB,2 psB,1 psB,2 rwAB,1 rwAB,2 rwBC,1 rwBC,2 psA,1 psA,2 psB,1 psB,2 rwBC,2 rwBC,1 rwAB,1 rwAB,2 A B C A B C Ψ mvA,1 mvA,2 psC,1 psC,1 psA,2 mvA,1 mvA,2 mvC,1 mvC,2 mvC,1 mvC,2
Design
19
CDN of a 3 agent group (A,B,C) for expedition/planning, where is the hypertree.
Performance
Ψ
PGM ‘08
psA,1 psC,2 psC,2 mvB,2 mvB,1 mvB,1 mvB,2 mvB,1 mvB,2 psB,1 psB,2 psB,1 psB,2 rwAB,1 rwAB,2 rwBC,1 rwBC,2 psA,1 psA,2 psB,1 psB,2 rwBC,2 rwBC,1 rwAB,1 rwAB,2 A B C A B C Ψ mvA,1 mvA,2 psC,1 psC,1 psA,2 mvA,1 mvA,2 mvC,1 mvC,2 mvC,1 mvC,2
Design
19
CDN of a 3 agent group (A,B,C) for expedition/planning, where is the hypertree.
Performance
Utility
Ψ
PGM ‘08
psA,1 psC,2 psC,2 mvB,2 mvB,1 mvB,1 mvB,2 mvB,1 mvB,2 psB,1 psB,2 psB,1 psB,2 rwAB,1 rwAB,2 rwBC,1 rwBC,2 psA,1 psA,2 psB,1 psB,2 rwBC,2 rwBC,1 rwAB,1 rwAB,2 A B C A B C Ψ mvA,1 mvA,2 psC,1 psC,1 psA,2 mvA,1 mvA,2 mvC,1 mvC,2 mvC,1 mvC,2
Design
19
CDN of a 3 agent group (A,B,C) for expedition/planning, where is the hypertree.
Performance
Utility
Ψ
PGM ‘08
psA,1 psC,2 psC,2 mvB,2 mvB,1 mvB,1 mvB,2 mvB,1 mvB,2 psB,1 psB,2 psB,1 psB,2 rwAB,1 rwAB,2 rwBC,1 rwBC,2 psA,1 psA,2 psB,1 psB,2 rwBC,2 rwBC,1 rwAB,1 rwAB,2 A B C A B C Ψ mvA,1 mvA,2 psC,1 psC,1 psA,2 mvA,1 mvA,2 mvC,1 mvC,2 mvC,1 mvC,2
Design
19
CDN of a 3 agent group (A,B,C) for expedition/planning, where is the hypertree.
Performance
Utility
Ψ
PGM ‘08
psA,1 psC,2 psC,2 mvB,2 mvB,1 mvB,1 mvB,2 mvB,1 mvB,2 psB,1 psB,2 psB,1 psB,2 rwAB,1 rwAB,2 rwBC,1 rwBC,2 psA,1 psA,2 psB,1 psB,2 rwBC,2 rwBC,1 rwAB,1 rwAB,2 A B C A B C Ψ mvA,1 mvA,2 psC,1 psC,1 psA,2 mvA,1 mvA,2 mvC,1 mvC,2 mvC,1 mvC,2
Design
19
CDN of a 3 agent group (A,B,C) for expedition/planning, where is the hypertree.
Performance
Utility
Ψ
PGM ‘08
20
PGM ‘08
paradigm.
actions.
agent’s models.
21
PGM ‘08
A1 A2 A3 Utility
0.5 {N, N} {N, S} {H, W} {H, H}
Actions
22
PGM ‘08
A1 A2 A3 Utility
0.5 {N, N} {N, S} {H, W} {H, H}
Actions
22
PGM ‘08
A1 A2 A3 Utility
0.5 {N, N} {N, S} {H, W} {H, H}
Actions
22
PGM ‘08
A1 A2 A3 Utility
0.5 {N, N} {N, S} {H, W} {H, H}
Actions
22
PGM ‘08
A1 A2 A3 Utility
0.5 {N, N} {N, S} {H, W} {H, H}
Actions
22
PGM ‘08
A1 A2 A3 Utility
0.5 {N, N} {N, S} {H, W} {H, H}
Actions
22
PGM ‘08
A1 A2 A3 Utility
0.5 {N, N} {N, S} {H, W} {H, H}
Actions
22
PGM ‘08
23
A
B
A’s Private Observations
Shared Observations B’s Private Observations
PGM ‘08
24
PGM ‘08
A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.50 (N,N) (N,S) (N,N) 0.50 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.50 (N,N) (N,S) (N,N) 0.50 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.40 (N,N) (N,S) (N,N) 0.40 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.40 (N,N) (N,S) (N,N) 0.40 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.70 (N,N) (N,S) (N,N) 0.70 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72
P(nbpB
A = allLow, nbpB C = ¬allLow)
P(nbpB
A = ¬allLow, nbpB C = allLow)
P(nbpB
A = allLow, nbpB C = allLow)
P(nbpB
A = ¬allLow, nbpB C = ¬allLow)
25
PGM ‘08
A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.50 (N,N) (N,S) (N,N) 0.50 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.50 (N,N) (N,S) (N,N) 0.50 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.40 (N,N) (N,S) (N,N) 0.40 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.40 (N,N) (N,S) (N,N) 0.40 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.70 (N,N) (N,S) (N,N) 0.70 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72
P(nbpB
A = allLow, nbpB C = ¬allLow)
P(nbpB
A = ¬allLow, nbpB C = allLow)
P(nbpB
A = allLow, nbpB C = allLow)
P(nbpB
A = ¬allLow, nbpB C = ¬allLow)
25
PGM ‘08
A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.50 (N,N) (N,S) (N,N) 0.50 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.50 (N,N) (N,S) (N,N) 0.50 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.40 (N,N) (N,S) (N,N) 0.40 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.40 (N,N) (N,S) (N,N) 0.40 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.70 (N,N) (N,S) (N,N) 0.70 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72
P(nbpB
A = allLow, nbpB C = ¬allLow)
P(nbpB
A = ¬allLow, nbpB C = allLow)
P(nbpB
A = allLow, nbpB C = allLow)
P(nbpB
A = ¬allLow, nbpB C = ¬allLow)
25
2n
PGM ‘08
A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.50 (N,N) (N,S) (N,N) 0.50 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.50 (N,N) (N,S) (N,N) 0.50 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.40 (N,N) (N,S) (N,N) 0.40 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.40 (N,N) (N,S) (N,N) 0.40 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72 A B C u (N,N) (N,N) (N,N) 0.85 (N,N) (N,N) (N,S) 0.70 (N,N) (N,S) (N,N) 0.70 . . . . . . (H,H) (H,H) (H,E) 0.30 (H,H) (H,H) (H,W) 0.41 (H,H) (H,H) (H,H) 0.72
P(nbpB
A = allLow, nbpB C = ¬allLow)
P(nbpB
A = ¬allLow, nbpB C = allLow)
P(nbpB
A = allLow, nbpB C = allLow)
P(nbpB
A = ¬allLow, nbpB C = ¬allLow)
25
2n
PGM ‘08
26
mn
PGM ‘08
P(areaA|moveA) · P(area|moveC) P(areaA, areaC|moveA, moveC)
27
PGM ‘08
agents
28
PGM ‘08
29
(b) (a)
PGM ‘08
29
(b) (a)
PGM ‘08
29
(b) (a)
PGM ‘08
30
Table 1: Experimental results. Highest means bolded.
Barren Dense Path µ σ µ σ µ σ CDN 55.84 4.21 25.14 3.27 20.41 3.39 GRDU 48.56 0.56 12.32 0.20 12.20 0.15 GRDB 48.64 0.62 18.57 1.10 16.80 2.39 RMM 50.35 5.95 18.50 3.39 18.71 2.79
PGM ‘08
30
Table 1: Experimental results. Highest means bolded.
Barren Dense Path µ σ µ σ µ σ CDN 55.84 4.21 25.14 3.27 20.41 3.39 GRDU 48.56 0.56 12.32 0.20 12.20 0.15 GRDB 48.64 0.62 18.57 1.10 16.80 2.39 RMM 50.35 5.95 18.50 3.39 18.71 2.79
PGM ‘08
30
Table 1: Experimental results. Highest means bolded.
Barren Dense Path µ σ µ σ µ σ CDN 55.84 4.21 25.14 3.27 20.41 3.39 GRDU 48.56 0.56 12.32 0.20 12.20 0.15 GRDB 48.64 0.62 18.57 1.10 16.80 2.39 RMM 50.35 5.95 18.50 3.39 18.71 2.79
PGM ‘08
30
Table 1: Experimental results. Highest means bolded.
Barren Dense Path µ σ µ σ µ σ CDN 55.84 4.21 25.14 3.27 20.41 3.39 GRDU 48.56 0.56 12.32 0.20 12.20 0.15 GRDB 48.64 0.62 18.57 1.10 16.80 2.39 RMM 50.35 5.95 18.50 3.39 18.71 2.79
PGM ‘08
31
Table 1: The t-test results. CDN GRDU GRDB RMMBU Barren √ 99.99 √ 99.99 √ 99.99 Dense √ 99.99 √ 99.99 √ 99.99 Path √ 99.99 √ 99.99 √ 96.20
PGM ‘08
than RMM. Why?
action:
32
PGM ‘08
33
(0, 0) (1, 0) (2, 0) (3, 0) (4, 0) (5, 0) (6, 0)
A B C u b, u b, u u S1 : S2 : A B C u b, u b, u
b + u 2
PGM ‘08
without sufficient information.
rendered conditionally independent to take advantage of communication.
MAID could be adopted, but the above limitation stands.
34
PGM ‘08
agents based on observation.
multiple optimal joint plans exist.
convey sufficient states and decisions and lead to better coordination.
35
PGM ‘08
that our empirical results can generalize to other domains.
36
PGM ‘08
37
Menlo Park, CA, 1990. AAAI Press.
Proceedings of the 31st Hawaii International Conference on System Sciences, volume 5, pages 142–151, Los Alamitos, CA, January 1998. IEEE Computer Society.
Immerman, and S. Zilberstein. Mathematics of Operations Research, 27(4):819-840, 2002.
Database and Expert Systems Applications : 13th International Conference, DEXA 2002 Aix-en-Provence, France, September 2-6, 2002. Proceedings, Lecture Notes in Computer Science, 2002.
markov decision processes, Journal of Artificial Intelligence Research, 22:423-455, 2004.
Joint Conference on Autonomous Agents and Multiagent Systems, July 2005.
38
PGM ‘08