Optimal decentralized control
- f coupled subsystems
with control sharing
Aditya Mahajan
McGill University
IEEE Conference on Decision and Control, 2011
Optimal decentralized control of coupled subsystems with control - - PowerPoint PPT Presentation
Optimal decentralized control of coupled subsystems with control sharing Aditya Mahajan McGill University IEEE Conference on Decision and Control, 2011 A Mahajan (McGill) Control sharing info struc 1 Notation Random variables: ,
IEEE Conference on Decision and Control, 2011
A Mahajan (McGill) Control sharing info struc 1
Random variables: ๐, realizations: , state spaces: ๐ด. ๐๔
๔ means that variable ๐ belongs to subsystem at time .
๐๔ฃ:๔ = ๐๔ฃ, ๐๔ค, โฆ, ๐๔ ๐ = ๐๔ฃ, ๐๔ค, โฆ, ๐๔.
A Mahajan (McGill) Control sharing info struc 2
๔ฃ
๔
๔ค
๔
โฏ ๔
๔
๔ฃ
๔
๔ค
๔
โฏ ๔
๔
๔ฃ
๔
๔ค
๔
๔
๔
๐ฏ๔๔ญ๔ฃ ๐ฏ๔๔ญ๔ฃ ๐ฏ๔๔ญ๔ฃ
๔
๔๔ฌ๔ฃ = ๔ ๔ ๔ ๔, ๐ฏ๔, ๔ ๔
๔
๔ = ๔ ๔๔ ๔ฃ:๔, ๐ฏ๔ฃ:๔๔ญ๔ฃ
min
all policies ๐ก ๐ฝ [ ๔ป
โ
๔๔ฎ๔ฃ
๔๐ฒ๔, ๐ฏ๔]
A Mahajan (McGill) Control sharing info struc 3
Point-to-point real-time source coding, multi-terminal source coding with feedback, some classes of multiple access channel with feedback
Multi-access broadcast, some classes of decentralized scheduling and routing.
Paging and registration in cellular networks
A Mahajan (McGill) Control sharing info struc 4
๔
๔ = ๔ ๔๔ ๔ฃ:๔, ๐ฏ๔ฃ:๔๔ญ๔ฃ
Is part of this data redundant? Can part of this data be compressed to a sufficient statistic?
How does current control action affect future estimation? What information does controller communicate to controller via its control action?
A Mahajan (McGill) Control sharing info struc 5
Considered the LQG version of the problem Exploit the fact that the action space is continuous and compact to embed the observations in control Reduces to one-step delayed sharing pattern
Delayed state sharing: Aicadri, Davoli, and Minciardi, 1987 Delayed (observation) sharing: Witsenhausen 1971, Varaiya and Walrand, 1979, Nayyar, Mahajan, and Teneketzis, 2011 Periodic sharing: Ooi, Verbout, Ludwig, Wornell, 1997 Belief sharing: Yรผksel, 2009 Partial history sharing: Mahajan, Nayyar, Teneketzis, 2008
A Mahajan (McGill) Control sharing info struc 6
๔
๔ฃ:๔๔ญ๔ฃ is redundant for optimal performance.
wlo, ๔
๔ = ๔ ๔๔ ๔, ๐ฏ๔ฃ:๔๔ญ๔ฃ
Define ฮ ๔
๔ = โ๐๔ ๔ = | ๐๔ฃ:๔๔ญ๔ฃ and ๐ธ๔ = ฮ ๔ฃ ๔ , โฆ, ฮ ๔ ๔ .
๐๔ is a sufficient statistic of ๐ฏ๔ฃ:๔๔ญ๔ฃ for optimal performance. wlo, ๔
๔ = ๔ ๔๔ ๔, ๐๔
A Mahajan (McGill) Control sharing info struc 7
The states processes are conditionally independent given the past control actions. โ๐๔ฃ:๔ = ๐ฒ๔ฃ:๔ | ๐๔ฃ:๔ =
๔
โ
๔๔ฎ๔ฃ
โ๐๔
๔ฃ:๔ = ๔ ๔ฃ:๔ | ๐๔ฃ:๔
Fix ๔ญ๔ and consider optimal design of ๔. Let ๐๔
๔ = ๐๔ ๔, ๐๔ฃ:๔๔ญ๔ฃ. Then
{๐๔
๔, = , โฆ} is a controlled MDP with control action ๔ ๔.
โ๔
๔๔ฌ๔ฃ | ๔ ๔ฃ:๔, ๔ ๔ฃ:๔ = โ๔ ๔๔ฌ๔ฃ | ๔ ๔ , ๔ ๔
๐ฝ[๔๐ฒ๔, ๐ฏ๔ | ๔
๔ฃ:๔, ๔ ๔ฃ:๔] = ๐ฝ[๔๐ฒ๔, ๐ฏ๔ | ๔ ๔ , ๔ ๔]
A Mahajan (McGill) Control sharing info struc 8
๔
๔ = ๔ ๔๔ ๔ฃ:๔, ๐ฏ๔ฃ:๔๔ญ๔ฃ
๔
๔ = ๔ ๔๔ ๔ = ๔ ๔๔ ๔, ๐ฏ๔ฃ:๔๔ญ๔ฃ
Data at the controller is still increasing with time
A Mahajan (McGill) Control sharing info struc 9
General idea proposed in (Mahajan, Nayyar, and Teneketzis 2008)
A Mahajan (McGill) Control sharing info struc 10
where ๔
๔โ = ๔ ๔โ , ๐ฏ๔ฃ:๔๔ญ๔ฃ
A Mahajan (McGill) Control sharing info struc 11
The coordinated system is a POMDP Identify the structure of optimal coordination strategies for the coordinated system Show that the coordinated system is equivalent to the original model Translate the structure of optimal coordination strategies to the
A Mahajan (McGill) Control sharing info struc 12
๐๔ฃ
๔
๐๔ค
๔
๐๔ฃ:๔๔ญ๔ฃ ๔ฃ
๔
๔ค
๔
๔ฃ
๔
๔ค
๔
โ๔ ๔ฃ
๔ , ๔ค ๔
State: ๐ฒ๔ = ๔ฃ
๔ , โฆ, ๔ ๔
Observations: ๐ฏ๔๔ญ๔ฃ = ๔ฃ
๔๔ญ๔ฃ, โฆ, ๔ ๔๔ญ๔ฃ
Control actions: ๐๔ = ๔ฃ
๔ , โฆ, ๔ ๔ ,
Coordination rule: โ๔ : (
๔
โ
๔๔ฎ๔ฃ
๐ฑ๔)
๔๔ญ๔ฃ : ๔
โ
๔๔ฎ๔ฃ
๐ด๔ โ ๐ฑ๔
๔
๐๔ = โ๔๐ฏ๔ฃ:๔๔ญ๔ฃ
Define ฮ๔ = โstate | history of observations = โ๐ฒ | ๐๔ฃ:๔๔ญ๔ฃ. Then, wlo, ๐๔ = โ๔๐๔
A Mahajan (McGill) Control sharing info struc 13
๐ ๐ฝ [๔๐๔, ๐๔ + ๔๔ฌ๔ฃฮ๔๔ฌ๔ฃ | ฮ๔ = ๐]
The optimization at each step is a functional optimization problem. (In our opinion) functional optimization at each step is the only way to circumvent the issue of signaling.
A Mahajan (McGill) Control sharing info struc 14
๐๔ฃ
๔
๐๔ค
๔
๐๔ฃ:๔๔ญ๔ฃ ๔ฃ
๔
๔ค
๔
๔ฃ
๔
๔ค
๔
โ๔ ๔ฃ
๔ , ๔ค ๔
wlo, ๔
๔ = ๔ ๔๔ ๔ = โ๔ ๔๐๔๔ ๔ = ๔ ๔๔ ๔, ๐๔
Solve the DP for coordinated system. Choose ๔
๔๔ ๔, ๐๔ = โ๔ ๔๐๔๔ ๔
A Mahajan (McGill) Control sharing info struc 15
The states processes are conditionally independent given the past control actions. โ๐๔ฃ:๔ = ๐ฒ๔ฃ:๔ | ๐๔ฃ:๔ =
๔
โ
๔๔ฎ๔ฃ
โ๐๔
๔ฃ:๔ = ๔ ๔ฃ:๔ | ๐๔ฃ:๔
๐๔๐ฒ = โ๐๔ = ๐ฒ | ๐๔ฃ:๔๔ญ๔ฃ =
๔
โ
๔๔ฎ๔ฃ
๐๔
๔๔ ๔
A Mahajan (McGill) Control sharing info struc 16
wlo, ๔
๔ = ๔ ๔๔ ๔, ๐๔ = ๔ ๔๔ ๔, ๐๔
Significant reduction is size. ๐๔ โ ฮ๐ด๔ฃ ร โฏ ร ๐ด๔ while ๐๔ โ ฮ๐ด๔ฃ ร โฏ ร ฮ๐ด๔
๐ ๐ฝ [๔๐๔, ๐๔ + ๔๔ฌ๔ฃ๐ธ๔๔ฌ๔ฃ | ๐ธ๔ = ๐]
A Mahajan (McGill) Control sharing info struc 17
Original: ๔
๔ = ๔ ๔๔ ๔ฃ:๔, ๐ฏ๔ฃ:๔๔ญ๔ฃ
Using person-by-person approach ๔
๔ = ๔ ๔๔ ๔, ๐ฏ๔ฃ:๔๔ญ๔ฃ
Using the common information approach of (NMT 2008, 2011) ๔
๔ = ๔ ๔๔ ๔, ๐๔,
๐๔ = โ๐๔ | ๐ฏ๔ฃ:๔๔ญ๔ฃ Using specific conditional independence due to the dynamics ๔
๔ = ๔ ๔๔ ๔, ๐๔,
๐๔
๔ = โ๐๔ ๔ | ๐ฏ๔ฃ:๔๔ญ๔ฃ
A Mahajan (McGill) Control sharing info struc 18
๔
๔ โ {, }: # of packets in queue
๔
๔ โ {, }: # of arrivals โผ Ber๔
๔
๔ โ {, }: # of transmitted packets
Throughput:
๔ = ๔ฃ ๔ โ ๔ค ๔ + โ ๔ฃ ๔ ๔ค ๔
State update: ๔
๔๔ฌ๔ฃ = max (๔ ๔ โ ๔ ๔ ๔ + ๔ ๔, )
A Mahajan (McGill) Control sharing info struc 19
Symmeric arrivals: Hlyuchj and Gallager, 1981 feasible lower bound Symmeric arrivals: Ooi and Wornell, 1996 genie aided upper bound that numerically matched lower bound. Asymmetric arrivals: Used as benchmark problem in AI community (Hansen et al, 2004, Bernstein et al, 2005, Shez Charpillet, 2006) for numerical algorithms for DEC-POMDPs.
A Mahajan (McGill) Control sharing info struc 20
๐๔
๔ is equivalent to ๐๔ ๔ =โถ ๔ ๔ โ {, }
๔
๔ is equivalent to ๔ ๔ =โถ ๔ ๔ โ {, }
Structure of optimal policy ๔
๔ = ๔ ๔ โ ๔ ๔,
where ๔ฃ
๔ , ๔ค ๔ = โ๔ ๔๔ฃ ๔ , ๔ค ๔
A Mahajan (McGill) Control sharing info struc 21
Notation: for any โ [, ], let ๐ต = โ โ โ Characteristic polynomial: ๐๔ = + โ ๔ค โ + โ ๔๔ฌ๔ฃ. Let ๐ฝ๔ be the root of ๐๔ in [, ] and ๐ be the root of = โ ๔ค Optimal performance: ๐พ* = { โ โ ๔ค, if ๐ฝ๔ฃ โ ๔ค โ / + ๔ค + ๔ฅ,
A Mahajan (McGill) Control sharing info struc 22
When > ๐ โ*๔ฃ, ๔ค = { , if ๔ฃ > ๔ค , if ๔ฃ < ๔ค , or , if ๔ฃ = ๔ค When < ๐, let ๐ โ โ be such that ๐ฝ๔๔ฌ๔ฃ < ๐ฝ๔. โ*๔ฃ, ๔ค =
if ๔ฃ ๐ต๔ and ๔ค ๐ต๔ , if ๔ฃ > max๐ต๔, ๔ค , if ๔ค > max๐ต๔, ๔ฃ , or , if ๔ฃ = ๔ค =
A Mahajan (McGill) Control sharing info struc 23
Non-classical information structure Use properties
the system dynamics and the common information approach of (Mahajan, Nayyar, Teneketzis 2008) to find structure of optimal controller and a dynamic programming decomposition. Allows using standard tools from stochastic control to analyze specific applications.
Subclasses of decentralized control problems with signaling are solvable! Each step of the DP is a functional optimization problem.