Online Planning for Decentralized Stociastic Control with Partial History Sharing
Kaiqing Zhang, Erik Miehling, and Tamer Başar Coordinated Science Lab — UIUC
American Control Conference — Philadelphia, PA July 11, 2019
Online Planning for Decentralized Sto ci astic Control with Partial - - PowerPoint PPT Presentation
Online Planning for Decentralized Sto ci astic Control with Partial History Sharing Kaiqing Zhang, Erik Miehling, and Tamer Ba ar Coordinated Science Lab UIUC American Control Conference Philadelphia, PA July 11, 2019 Decentralized
American Control Conference — Philadelphia, PA July 11, 2019
2
Robotics Smart Grid Unmanned Aerial Vehicles MOBA Video Games
3
4
5
6
7
8
9
10
11
12
13
14
15
16
400 600 800 1000 1200 1400 1600 0.5 1 1.5 2
17
MAA* Our algorithm Centralized problem NOMDP POMDP Common information Empty General Local memory Local observation history General State System state + joint local observation history System state + joint local memory History Joint policies Joint prescriptions + common information Sufficient statistic Belief over system state + joint local
Belief over system state and joint local memory
18
MAA* Our algorithm Centralized problem NOMDP POMDP Common information Empty General Local memory Local histories (observations + actions) General State System state + joint local histories System state + joint local memory History Joint policies Joint prescriptions + common information Sufficient statistic Belief over system state + joint local histories Belief over system state and joint local memory
19
20
22 Algorithm 1 Decentralized Online Planning with Partial History Sharing – Agent i
function SEARCH(h) repeat if h = ∅ then (x, m1, . . . , mn) ⇠ B0 else (x, m1, . . . , mn) ⇠ B(h) end if SIMULATE(x, m1, . . . , mn, h, 0) until STOPPINGCONDITION() return argmaxδ2Γ V (hδ) end function function ROLLOUT(x, m1, . . . , mn, h, d) if βd < ε then return 0 end if γ = (γ1, . . . , γn) ⇠ (Γ1
rollout(h), . . . , Γn rollout(h))
(u1, . . . , un) (γ1(m1), . . . , γn(mn)) (x0, y1, . . . , yn, r) ⇠ G(x, u1, . . . , un) zi Pi
Z(mi, ui, yi) and share zi with all other agents
h0 hγz (m10, . . . , mn0) (P1
L(m1, u1, y1, z1), . . . ,
Pn
L(mn, un, yn, zn))
R r + β · ROLLOUT(x0, m10, . . . , mn0, h0, d + 1) return R end function function SIMULATE(x, m1, . . . , mn, h, d) if βd < ε then return 0 end if if h 62 T i then for all γ 2 Γ do T i(hγ) (N0(hγ), V0(hγ)) end for return ROLLOUT(x, m1, . . . , mn, h, d) end if γ=(γ1, . . . , γn) 2 argmax
(δ1,...,δn)2Γ1⇥···⇥Γn
V (hδ) + ρ s log N(h) N(hδ) (u1, . . . , un) (γ1(m1), . . . , γn(mn)) (x0, y1, . . . , yn, r) ⇠ G(x, u1, . . . , un) zi Pi
Z(mi, ui, yi) and share zi with all other agents
h0 hγz (m10, . . . , mn0) (P1
L(m1, u1, y1, z1), . . . , Pn L(mn, un, yn, zn))
N(h) N(h) + 1 R r + β · SIMULATE(x0, m10, . . . , mn0, h0, d + 1) N(hγ) N(hγ) + 1 V (hγ) V (hγ) + RV (hγ)
N(hγ)
return R end function