Identifying tractable decentralized control problems on the basis
- f information structure
Aditya Mahajan, Ashutosh Nayyar, and Demosthenis Teneketzis Dept of EECS, University of Michigan, Ann Arbor September 26, 2008
Identifying tractable decentralized control problems on the basis - - PowerPoint PPT Presentation
Identifying tractable decentralized control problems on the basis of information structure Aditya Mahajan, Ashutosh Nayyar, and Demosthenis Teneketzis Dept of EECS, University of Michigan, Ann Arbor September 26, 2008 Optimal design of
Aditya Mahajan, Ashutosh Nayyar, and Demosthenis Teneketzis Dept of EECS, University of Michigan, Ann Arbor September 26, 2008
Optimal design of decentralized systems with non-classical information structures
⊲ Standard form (Witsenhausen, 1973) ⊲ k-step delay sharing pattern (Walrand and Varaiya, 1978) ⊲ Generic team model of Witsenhausen (1988)
functions as control actions
Model A for two agents
Plant Agent 1 Agent 2
Xt Z1
t
Z2
t
Yt U1
t
U2
t
Model A for two agents
t, U2 t, Wt)
⊲ Common message: Yt = ct(Xt, U1
t−1, U2 t−1, Qt)
⊲ Private message: Z1
t = h1 t(Xt, N1 t)
Z2
t = h2 t(Xt, U1 t, N2 t)
⊲ Control: Uk
t = gk t(Yt, Zk t, Mk t−1)
⊲ Memory update: Mk
t = lk t(Yt, Zk t, Mk t−1)
t, U2 t).
Cost of a design: E T
ρt(Xt, U1
t, U2 t)
Model A for two agents
Plant Agent 1 Agent 2
× × ×
Xt Z1
t
Z2
t
Yt U1
t
U2
t
⊲ Non-classical information structures ⊲ Sequential system
Xt Xt Qt Qt Yt Yt N1
t
N1
t Z1 t
Z1
t U1 t
U1
t M1 t
M1
t
N2
t
N2
t Z2 t
Z2
t U2 t
U2
t M2 t
M2
t
(g1
t, l1 t)
(g1
t, l1 t)
(g2
t, l2 t)
(g2
t, l2 t)
Variables Control laws
Common Agent
Common agent observes all common messages
Uk
t = gk t(Yt, Zk t, Mk t−1)
= ^ gk
t(Zk t, Mk t−1),
where ^ gk
t = γk t(Yt)
Similarly, Mk
t = lk t(Yt, Zk t, Mk t−1)
= ^ lk
t(Zk t, Mk t−1),
where ^ lk
t = λk t(Yt)
Common Agent's viewpoint
Plant Agent 1 Agent 2
Xt Z1
t
Z2
t
Yt U1
t
U2
t
Common Agent's viewpoint
Plant Agent 1 Agent 2
CA Xt Z1
t
Z2
t
Yt U1
t
U2
t
(^ g1
t,^
l1
t)
(^ g2
t,^
l2
t)
Common Agent's viewpoint
Xt Xt Qt Qt Yt Yt N1
t
N1
t Z1 t
Z1
t U1 t
U1
t M1 t
M1
t
N2
t
N2
t Z2 t
Z2
t U2 t
U2
t M2 t
M2
t
O0
t
O0
t
O1
t
O1
t
O2
t
O2
t
π0
t
π0
t
π1
t
π1
t
π2
t
π2
t
(^ g1
t,^
l1
t)
(^ g1
t,^
l1
t)
(^ g2
t,^
l2
t)
(^ g2
t,^
l2
t)
Variables Obs of CA Info states Control actions of CA
t0 t1 t2
S0
t = (Xt, M1 t−1, M2 t−1, U1 t−1, U2 t−1),
O0
t = Yt
S1
t = (Xt, M1 t−1, M2 t−1),
O1
t = −
S2
t = (Xt, M1 t, M2 t−1, U1 t),
O2
t = −
t,
⊲ Obs: Oi
t,
⊲ Control actions: (^ gk
t,^
lk
t)
From the common agent's viewpoint {S0
t, S1 t, S2 t, t = 1, . . . , T} is
a POMDP (partially observable Markov decision process)
Sequential decomposition
π0
t = Pr
t
g1,t−1,^ l1,t−1, ^ g2,t−1,^ lt−1 π1
t = Pr
t
g1,t−1,^ l1,t−1, ^ g2,t−1,^ lt−1 π2
t = Pr
t
g1,t,^ l1,t, ^ g2,t−1,^ lt−1
V0
T+1(π0 T+1) ≡ 0,
for t = 1, . . . , T V0
t (π0 t) = E
t (π1 t)
t
V1
t (π1 t) = min θ1
t
t (π2 t)
t, θ1 t
, V2
t (π2 t) = min θ2
t
t, U2 t) + V0 t+1(π0 t+1)
t, θ2 t
, where θk
t = (^
gk
t,^
lk
t)
Models considered in the paper
⊲ n-agent version of what was presented here
⊲ Model A with no common messages
Example — multiaccess broadcast
Tx 1 Tx 2
Broadcast medium
⊲ Single user transmits = ⇒ successful transmission ⊲ Both users transmit = ⇒ packet collision
⊲ Queues with buffer size 1 ⊲ Packet held in queue until successful transmission ⊲ Packet arrival is independent Bernoulli process
Example — multiaccess broadcast
Both transmitters know if there was no transmission, successful transmission, or a collision
If packet is available, decide whether or not to transmit based on all past channel feedback
⊲ Avoid collisions ⊲ Avoid idle
History of multiaccess broadcast
“Multiaccess of a slotted channel by finitely many users”, NTC 81. ⊲ Considered symmetric arrival rates ⊲ Restricted attention to “window protocols”
“Decentralized control of multiple access broadcast channels”, CDC 96. ⊲ Considered a relaxation of the problem ⊲ Numerically find optimal performance of the relaxed problem ⊲ Hluchyj and Gallager's scheme meets this upper bound
⊲ Consider the case of asymmetric arrival rates ⊲ Approximate heuristic solutions for small horizons
Multi-access broadcast is equivalent to Model A
Tx 1 Tx 2
Broadcast medium Tx 1 ≡ Agent 1 Tx 2 ≡ Agent 2 Channel feedback ≡ Common message Number of packets in each buffer ≡ Private messages
t, Z2 t
Zk
t = {0, 1}
gk
t : {0, 1} → {Tx, Don't Tx}
Equivalent to a POMDP with finite state and action spaces
Tractability
⊲ All system variables are finite valued
⊲ All system variables take values in a time-invariant space ⊲ The system is time-homogeneous
Conclusions