Identifying tractable decentralized control problems on the basis - - PowerPoint PPT Presentation

identifying tractable decentralized control problems on
SMART_READER_LITE
LIVE PREVIEW

Identifying tractable decentralized control problems on the basis - - PowerPoint PPT Presentation

Identifying tractable decentralized control problems on the basis of information structure Aditya Mahajan, Ashutosh Nayyar, and Demosthenis Teneketzis Dept of EECS, University of Michigan, Ann Arbor September 26, 2008 Optimal design of


slide-1
SLIDE 1

Identifying tractable decentralized control problems on the basis

  • f information structure

Aditya Mahajan, Ashutosh Nayyar, and Demosthenis Teneketzis Dept of EECS, University of Michigan, Ann Arbor September 26, 2008

slide-2
SLIDE 2

Optimal design of decentralized systems with non-classical information structures

  • Difficulties: Conceptual and computational
  • Results of this paper: Consider two general models of decentralized systems and
  • btain a sequential decomposition for their finite and infinite horizon cases.
  • Our models encompass

⊲ Standard form (Witsenhausen, 1973) ⊲ k-step delay sharing pattern (Walrand and Varaiya, 1978) ⊲ Generic team model of Witsenhausen (1988)

  • Main idea: viewed appropriately, these models are equivalent to POMDPs with

functions as control actions

  • Numerical solution can be obtained using existing techiques for POMDPs
slide-3
SLIDE 3

Model A for two agents

Plant Agent 1 Agent 2

× × ×

Xt Z1

t

Z2

t

Yt U1

t

U2

t

slide-4
SLIDE 4

Model A for two agents

  • Plant: Xt+1 = ft(Xt, U1

t, U2 t, Wt)

  • Observations

⊲ Common message: Yt = ct(Xt, U1

t−1, U2 t−1, Qt)

⊲ Private message: Z1

t = h1 t(Xt, N1 t)

Z2

t = h2 t(Xt, U1 t, N2 t)

  • Agent k

⊲ Control: Uk

t = gk t(Yt, Zk t, Mk t−1)

⊲ Memory update: Mk

t = lk t(Yt, Zk t, Mk t−1)

  • Design ≡ all control and memory update functions of both agents
  • Cost at time t: ρt(Xt, U1

t, U2 t).

Cost of a design: E T

  • t=1

ρt(Xt, U1

t, U2 t)

  • Design
  • Objective: Determine an optimal design
slide-5
SLIDE 5

Model A for two agents

Plant Agent 1 Agent 2

× × ×

Xt Z1

t

Z2

t

Yt U1

t

U2

t

  • Salient features

⊲ Non-classical information structures ⊲ Sequential system

Xt Xt Qt Qt Yt Yt N1

t

N1

t Z1 t

Z1

t U1 t

U1

t M1 t

M1

t

N2

t

N2

t Z2 t

Z2

t U2 t

U2

t M2 t

M2

t

(g1

t, l1 t)

(g1

t, l1 t)

(g2

t, l2 t)

(g2

t, l2 t)

Variables Control laws

slide-6
SLIDE 6

Consider the model from the point of view of a fictitious common agent

slide-7
SLIDE 7

Common Agent

Common agent observes all common messages

  • Think of control and memory update functions in two steps

Uk

t = gk t(Yt, Zk t, Mk t−1)

= ^ gk

t(Zk t, Mk t−1),

where ^ gk

t = γk t(Yt)

Similarly, Mk

t = lk t(Yt, Zk t, Mk t−1)

= ^ lk

t(Zk t, Mk t−1),

where ^ lk

t = λk t(Yt)

slide-8
SLIDE 8

Common Agent's viewpoint

Plant Agent 1 Agent 2

× × ×

Xt Z1

t

Z2

t

Yt U1

t

U2

t

slide-9
SLIDE 9

Common Agent's viewpoint

Plant Agent 1 Agent 2

× × ×

CA Xt Z1

t

Z2

t

Yt U1

t

U2

t

(^ g1

t,^

l1

t)

(^ g2

t,^

l2

t)

slide-10
SLIDE 10

Common Agent's viewpoint

Xt Xt Qt Qt Yt Yt N1

t

N1

t Z1 t

Z1

t U1 t

U1

t M1 t

M1

t

N2

t

N2

t Z2 t

Z2

t U2 t

U2

t M2 t

M2

t

O0

t

O0

t

O1

t

O1

t

O2

t

O2

t

π0

t

π0

t

π1

t

π1

t

π2

t

π2

t

(^ g1

t,^

l1

t)

(^ g1

t,^

l1

t)

(^ g2

t,^

l2

t)

(^ g2

t,^

l2

t)

Variables Obs of CA Info states Control actions of CA

t0 t1 t2

  • Consider three time steps t0, t1, and t2 in time interval t

S0

t = (Xt, M1 t−1, M2 t−1, U1 t−1, U2 t−1),

O0

t = Yt

S1

t = (Xt, M1 t−1, M2 t−1),

O1

t = −

S2

t = (Xt, M1 t, M2 t−1, U1 t),

O2

t = −

  • POMDP with: ⊲ State: Si

t,

⊲ Obs: Oi

t,

⊲ Control actions: (^ gk

t,^

lk

t)

From the common agent's viewpoint {S0

t, S1 t, S2 t, t = 1, . . . , T} is

a POMDP (partially observable Markov decision process)

slide-11
SLIDE 11

Sequential decomposition

  • Information states

π0

t = Pr

  • S0

t

  • Yt, ^

g1,t−1,^ l1,t−1, ^ g2,t−1,^ lt−1 π1

t = Pr

  • S1

t

  • Yt, ^

g1,t−1,^ l1,t−1, ^ g2,t−1,^ lt−1 π2

t = Pr

  • S2

t

  • Yt, ^

g1,t,^ l1,t, ^ g2,t−1,^ lt−1

  • Optimality equations

V0

T+1(π0 T+1) ≡ 0,

for t = 1, . . . , T V0

t (π0 t) = E

  • V1

t (π1 t)

  • π0

t

  • ,

V1

t (π1 t) = min θ1

t

  • E
  • V2

t (π2 t)

  • π1

t, θ1 t

, V2

t (π2 t) = min θ2

t

  • E
  • ρt(Xt, U1

t, U2 t) + V0 t+1(π0 t+1)

  • π2

t, θ2 t

, where θk

t = (^

gk

t,^

lk

t)

slide-12
SLIDE 12

Models considered in the paper

  • Model A

⊲ n-agent version of what was presented here

  • Model B

⊲ Model A with no common messages

  • Also consider infinite horizon problems
slide-13
SLIDE 13

Example — multiaccess broadcast

Tx 1 Tx 2

Broadcast medium

  • MAB Channel

⊲ Single user transmits = ⇒ successful transmission ⊲ Both users transmit = ⇒ packet collision

  • Transmitters

⊲ Queues with buffer size 1 ⊲ Packet held in queue until successful transmission ⊲ Packet arrival is independent Bernoulli process

slide-14
SLIDE 14

Example — multiaccess broadcast

  • Channel feedback

Both transmitters know if there was no transmission, successful transmission, or a collision

  • Policy of transmitters

If packet is available, decide whether or not to transmit based on all past channel feedback

  • Objective: Maximize throughput

⊲ Avoid collisions ⊲ Avoid idle

slide-15
SLIDE 15

History of multiaccess broadcast

  • Hluchyj and Gallager,

“Multiaccess of a slotted channel by finitely many users”, NTC 81. ⊲ Considered symmetric arrival rates ⊲ Restricted attention to “window protocols”

  • Ooi and Wornell,

“Decentralized control of multiple access broadcast channels”, CDC 96. ⊲ Considered a relaxation of the problem ⊲ Numerically find optimal performance of the relaxed problem ⊲ Hluchyj and Gallager's scheme meets this upper bound

  • AI Literature

⊲ Consider the case of asymmetric arrival rates ⊲ Approximate heuristic solutions for small horizons

slide-16
SLIDE 16

Multi-access broadcast is equivalent to Model A

Tx 1 Tx 2

Broadcast medium Tx 1 ≡ Agent 1 Tx 2 ≡ Agent 2 Channel feedback ≡ Common message Number of packets in each buffer ≡ Private messages

  • Information state: πt = Pr
  • Z1

t, Z2 t

  • feedback
  • ,

Zk

t = {0, 1}

  • Action Space: ^

gk

t : {0, 1} → {Tx, Don't Tx}

Equivalent to a POMDP with finite state and action spaces

slide-17
SLIDE 17

Tractability

  • Finite horizon problem

⊲ All system variables are finite valued

  • Infinite horizon

⊲ All system variables take values in a time-invariant space ⊲ The system is time-homogeneous

Conclusions

  • Sequential decomposition of two general models of decentralized systems
  • Equivalent to POMDPs (sometimes to POMDPs with finite state and action spaces)
  • Harder to solve than POMDPs due to expansion of state and action spaces.
slide-18
SLIDE 18

Thank you