Optimal decentralized control of coupled subsystems with control - - PowerPoint PPT Presentation

โ–ถ
optimal decentralized control of coupled subsystems with
SMART_READER_LITE
LIVE PREVIEW

Optimal decentralized control of coupled subsystems with control - - PowerPoint PPT Presentation

Optimal decentralized control of coupled subsystems with control sharing Aditya Mahajan McGill University IEEE Conference on Decision and Control, 2011 A Mahajan (McGill) Control sharing info struc 1 Notation Random variables: ,


slide-1
SLIDE 1

Optimal decentralized control

  • f coupled subsystems

with control sharing

Aditya Mahajan

McGill University

IEEE Conference on Decision and Control, 2011

slide-2
SLIDE 2

A Mahajan (McGill) Control sharing info struc 1

Notation

Random variables: ๐‘Œ, realizations: , state spaces: ๐’ด. ๐‘๔€Š

๔€• means that variable ๐‘ belongs to subsystem at time .

๐‘๔€ฃ:๔€• = ๐‘๔€ฃ, ๐‘๔€ค, โ€ฆ, ๐‘๔€• ๐› = ๐‘๔€ฃ, ๐‘๔€ค, โ€ฆ, ๐‘๔€.

slide-3
SLIDE 3

A Mahajan (McGill) Control sharing info struc 2

System Model

๔€ฃ

๔€•

๔€ค

๔€•

โ‹ฏ ๔€

๔€•

๔€ฃ

๔€•

๔€ค

๔€•

โ‹ฏ ๔€

๔€•

๔€ฃ

๔€•

๔€ค

๔€•

๔€

๔€•

๐ฏ๔€•๔€ญ๔€ฃ ๐ฏ๔€•๔€ญ๔€ฃ ๐ฏ๔€•๔€ญ๔€ฃ

Control-coupled subsystems Controller with control sharing

๔€Š

๔€•๔€ฌ๔€ฃ = ๔€Š ๔€• ๔€Š ๔€•, ๐ฏ๔€•, ๔€Š ๔€•

๔€Š

๔€• = ๔€Š ๔€•๔€Š ๔€ฃ:๔€•, ๐ฏ๔€ฃ:๔€•๔€ญ๔€ฃ

Objective

min

all policies ๐ก ๐”ฝ [ ๔€ป

โˆ‘

๔€•๔€ฎ๔€ฃ

๔€•๐ฒ๔€•, ๐ฏ๔€•]

slide-4
SLIDE 4

A Mahajan (McGill) Control sharing info struc 3

Some applications

Feedback communication systems (physical layer)

Point-to-point real-time source coding, multi-terminal source coding with feedback, some classes of multiple access channel with feedback

Queueing networks (media access layer)

Multi-access broadcast, some classes of decentralized scheduling and routing.

Cellular networks

Paging and registration in cellular networks

slide-5
SLIDE 5

A Mahajan (McGill) Control sharing info struc 4

Conceptual difficulties

The system has non-classical information structure Data at each controller is increasing with time

๔€Š

๔€• = ๔€Š ๔€•๔€Š ๔€ฃ:๔€•, ๐ฏ๔€ฃ:๔€•๔€ญ๔€ฃ

Is part of this data redundant? Can part of this data be compressed to a sufficient statistic?

Multi-stage decision making

How does current control action affect future estimation? What information does controller communicate to controller via its control action?

slide-6
SLIDE 6

A Mahajan (McGill) Control sharing info struc 5

Literature Overview

Control sharing info-structure (Bismut, 1972, Sandell and Athans, 1974)

Considered the LQG version of the problem Exploit the fact that the action space is continuous and compact to embed the observations in control Reduces to one-step delayed sharing pattern

Other non-classical info-structures with sharing

Delayed state sharing: Aicadri, Davoli, and Minciardi, 1987 Delayed (observation) sharing: Witsenhausen 1971, Varaiya and Walrand, 1979, Nayyar, Mahajan, and Teneketzis, 2011 Periodic sharing: Ooi, Verbout, Ludwig, Wornell, 1997 Belief sharing: Yรผksel, 2009 Partial history sharing: Mahajan, Nayyar, Teneketzis, 2008

slide-7
SLIDE 7

A Mahajan (McGill) Control sharing info struc 6

Outline of the results

First structural result (based on person-by-person opt.)

๔€Š

๔€ฃ:๔€•๔€ญ๔€ฃ is redundant for optimal performance.

wlo, ๔€Š

๔€• = ๔€Š ๔€•๔€Š ๔€•, ๐ฏ๔€ฃ:๔€•๔€ญ๔€ฃ

Second structural result (based on common info approach

  • f MNT 2008)

Define ฮ ๔€Š

๔€• = โ„™๐‘Œ๔€Š ๔€• = | ๐•๔€ฃ:๔€•๔€ญ๔€ฃ and ๐šธ๔€• = ฮ ๔€ฃ ๔€• , โ€ฆ, ฮ ๔€ ๔€• .

๐†๔€• is a sufficient statistic of ๐ฏ๔€ฃ:๔€•๔€ญ๔€ฃ for optimal performance. wlo, ๔€Š

๔€• = ๔€Š ๔€•๔€Š ๔€•, ๐†๔€•

Dynamic programming decomposition

slide-8
SLIDE 8

A Mahajan (McGill) Control sharing info struc 7

Structural result based on person-by-person

  • ptimality

Main lemma

The states processes are conditionally independent given the past control actions. โ„™๐˜๔€ฃ:๔€• = ๐ฒ๔€ฃ:๔€• | ๐•๔€ฃ:๔€• =

๔€

โˆ

๔€Š๔€ฎ๔€ฃ

โ„™๐‘Œ๔€Š

๔€ฃ:๔€• = ๔€Š ๔€ฃ:๔€• | ๐•๔€ฃ:๔€•

Implications

Fix ๔€ญ๔€Š and consider optimal design of ๔€Š. Let ๐‘†๔€Š

๔€• = ๐‘Œ๔€Š ๔€•, ๐•๔€ฃ:๔€•๔€ญ๔€ฃ. Then

{๐‘†๔€Š

๔€•, = , โ€ฆ} is a controlled MDP with control action ๔€Š ๔€•.

โ„™๔€Š

๔€•๔€ฌ๔€ฃ | ๔€Š ๔€ฃ:๔€•, ๔€Š ๔€ฃ:๔€• = โ„™๔€Š ๔€•๔€ฌ๔€ฃ | ๔€Š ๔€• , ๔€Š ๔€•

๐”ฝ[๔€•๐ฒ๔€•, ๐ฏ๔€• | ๔€Š

๔€ฃ:๔€•, ๔€Š ๔€ฃ:๔€•] = ๐”ฝ[๔€•๐ฒ๔€•, ๐ฏ๔€• | ๔€Š ๔€• , ๔€Š ๔€•]

slide-9
SLIDE 9

Structural result . . . (cont.)

A Mahajan (McGill) Control sharing info struc 8

Original model

๔€Š

๔€• = ๔€Š ๔€•๔€Š ๔€ฃ:๔€•, ๐ฏ๔€ฃ:๔€•๔€ญ๔€ฃ

Implication of person-by-person optimality argument

๔€Š

๔€• = ๔€Š ๔€•๔€Š ๔€• = ๔€Š ๔€•๔€Š ๔€•, ๐ฏ๔€ฃ:๔€•๔€ญ๔€ฃ

Design difficulty

Data at the controller is still increasing with time

slide-10
SLIDE 10

A Mahajan (McGill) Control sharing info struc 9

A coordinator based on common information

General idea proposed in (Mahajan, Nayyar, and Teneketzis 2008)

๐‘Œ๔€ฃ

๔€• , ๐•๔€ฃ:๔€•๔€ญ๔€ฃ

๐‘Œ๔€ค

๔€• , ๐•๔€ฃ:๔€•๔€ญ๔€ฃ

๔€ฃ

๔€•

๔€ค

๔€•

๔€ฃ

๔€•

๔€ค

๔€•

slide-11
SLIDE 11

A coordinator based on common information (cont.)

A Mahajan (McGill) Control sharing info struc 10

๐‘Œ๔€ฃ

๔€•

๐‘Œ๔€ค

๔€•

๐•๔€ฃ:๔€•๔€ญ๔€ฃ ๔€ฃ

๔€•

๔€ค

๔€•

๔€ฃ

๔€•

๔€ค

๔€•

โ„Ž๔€• ๔€ฃ

๔€• , ๔€ค ๔€•

where ๔€Š

๔€•โ‹… = ๔€Š ๔€•โ‹…, ๐ฏ๔€ฃ:๔€•๔€ญ๔€ฃ

slide-12
SLIDE 12

A coordinator based on common information (cont.)

A Mahajan (McGill) Control sharing info struc 11

Solution approach

The coordinated system is a POMDP Identify the structure of optimal coordination strategies for the coordinated system Show that the coordinated system is equivalent to the original model Translate the structure of optimal coordination strategies to the

  • riginal model
slide-13
SLIDE 13

A Mahajan (McGill) Control sharing info struc 12

The coordinated system

๐‘Œ๔€ฃ

๔€•

๐‘Œ๔€ค

๔€•

๐•๔€ฃ:๔€•๔€ญ๔€ฃ ๔€ฃ

๔€•

๔€ค

๔€•

๔€ฃ

๔€•

๔€ค

๔€•

โ„Ž๔€• ๔€ฃ

๔€• , ๔€ค ๔€•

State: ๐ฒ๔€• = ๔€ฃ

๔€• , โ€ฆ, ๔€ ๔€•

Observations: ๐ฏ๔€•๔€ญ๔€ฃ = ๔€ฃ

๔€•๔€ญ๔€ฃ, โ€ฆ, ๔€ ๔€•๔€ญ๔€ฃ

Control actions: ๐ž๔€• = ๔€ฃ

๔€• , โ€ฆ, ๔€ ๔€• ,

Coordination rule: โ„Ž๔€• : (

๔€

โˆ

๔€Š๔€ฎ๔€ฃ

๐’ฑ๔€Š)

๔€•๔€ญ๔€ฃ : ๔€

โˆ

๔€Š๔€ฎ๔€ฃ

๐’ด๔€Š โ†’ ๐’ฑ๔€Š

๔€•

๐ž๔€• = โ„Ž๔€•๐ฏ๔€ฃ:๔€•๔€ญ๔€ฃ

Structure of optimal coordination strategy

Define ฮž๔€• = โ„™state | history of observations = โ„™๐ฒ | ๐•๔€ฃ:๔€•๔€ญ๔€ฃ. Then, wlo, ๐ž๔€• = โ„Ž๔€•๐œŠ๔€•

slide-14
SLIDE 14

The coordinated system (cont.)

A Mahajan (McGill) Control sharing info struc 13

Dynamic programming decomposition

  • ๔€•๐œŠ = min

๐ž ๐”ฝ [๔€•๐˜๔€•, ๐•๔€• + ๔€•๔€ฌ๔€ฃฮž๔€•๔€ฌ๔€ฃ | ฮž๔€• = ๐œŠ]

Salient features

The optimization at each step is a functional optimization problem. (In our opinion) functional optimization at each step is the only way to circumvent the issue of signaling.

slide-15
SLIDE 15

A Mahajan (McGill) Control sharing info struc 14

Translation of results back to the original system

๐‘Œ๔€ฃ

๔€•

๐‘Œ๔€ค

๔€•

๐•๔€ฃ:๔€•๔€ญ๔€ฃ ๔€ฃ

๔€•

๔€ค

๔€•

๔€ฃ

๔€•

๔€ค

๔€•

โ„Ž๔€• ๔€ฃ

๔€• , ๔€ค ๔€•

Structural result

wlo, ๔€Š

๔€• = ๔€Š ๔€•๔€Š ๔€• = โ„Ž๔€Š ๔€•๐œŠ๔€•๔€Š ๔€• = ๔€Š ๔€•๔€Š ๔€•, ๐œŠ๔€•

Dynamic programming decomposition

Solve the DP for coordinated system. Choose ๔€Š

๔€•๔€Š ๔€•, ๐œŠ๔€• = โ„Ž๔€Š ๔€•๐œŠ๔€•๔€Š ๔€•

slide-16
SLIDE 16

A Mahajan (McGill) Control sharing info struc 15

Further simplification of structural result

Recall main lemma:

The states processes are conditionally independent given the past control actions. โ„™๐˜๔€ฃ:๔€• = ๐ฒ๔€ฃ:๔€• | ๐•๔€ฃ:๔€• =

๔€

โˆ

๔€Š๔€ฎ๔€ฃ

โ„™๐‘Œ๔€Š

๔€ฃ:๔€• = ๔€Š ๔€ฃ:๔€• | ๐•๔€ฃ:๔€•

Implication

๐œŠ๔€•๐ฒ = โ„™๐˜๔€• = ๐ฒ | ๐•๔€ฃ:๔€•๔€ญ๔€ฃ =

๔€

โˆ

๔€Š๔€ฎ๔€ฃ

๐œŒ๔€Š

๔€•๔€Š ๔€•

slide-17
SLIDE 17

Further simplification of structural result (cont.)

A Mahajan (McGill) Control sharing info struc 16

Simplified structural result

wlo, ๔€Š

๔€• = ๔€Š ๔€•๔€Š ๔€•, ๐œŠ๔€• = ๔€Š ๔€•๔€Š ๔€•, ๐†๔€•

Significant reduction is size. ๐œŠ๔€• โˆˆ ฮ”๐’ด๔€ฃ ร— โ‹ฏ ร— ๐’ด๔€ while ๐†๔€• โˆˆ ฮ”๐’ด๔€ฃ ร— โ‹ฏ ร— ฮ”๐’ด๔€

Simplified dynamic programming decomposition

  • ๔€•๐† = min

๐ž ๐”ฝ [๔€•๐˜๔€•, ๐•๔€• + ๔€•๔€ฌ๔€ฃ๐šธ๔€•๔€ฌ๔€ฃ | ๐šธ๔€• = ๐†]

slide-18
SLIDE 18

A Mahajan (McGill) Control sharing info struc 17

Recap of structural results

Original: ๔€Š

๔€• = ๔€Š ๔€•๔€Š ๔€ฃ:๔€•, ๐ฏ๔€ฃ:๔€•๔€ญ๔€ฃ

Using person-by-person approach ๔€Š

๔€• = ๔€Š ๔€•๔€Š ๔€•, ๐ฏ๔€ฃ:๔€•๔€ญ๔€ฃ

Using the common information approach of (NMT 2008, 2011) ๔€Š

๔€• = ๔€Š ๔€•๔€Š ๔€•, ๐œŠ๔€•,

๐œŠ๔€• = โ„™๐˜๔€• | ๐ฏ๔€ฃ:๔€•๔€ญ๔€ฃ Using specific conditional independence due to the dynamics ๔€Š

๔€• = ๔€Š ๔€•๔€Š ๔€•, ๐†๔€•,

๐œŒ๔€Š

๔€• = โ„™๐‘Œ๔€Š ๔€• | ๐ฏ๔€ฃ:๔€•๔€ญ๔€ฃ

slide-19
SLIDE 19

A Mahajan (McGill) Control sharing info struc 18

An Example: Two-user multiple access broadcast

Two-user with single slot buffer

๔€Š

๔€• โˆˆ {, }: # of packets in queue

๔€Š

๔€• โˆˆ {, }: # of arrivals โˆผ Ber๔€Š

๔€Š

๔€• โˆˆ {, }: # of transmitted packets

Multiple-access channel

Throughput:

๔€• = ๔€ฃ ๔€• โˆ’ ๔€ค ๔€• + โˆ’ ๔€ฃ ๔€• ๔€ค ๔€•

  • ๔€• available to both users after one-step delay

State update: ๔€Š

๔€•๔€ฌ๔€ฃ = max (๔€Š ๔€• โˆ’ ๔€Š ๔€• ๔€• + ๔€Š ๔€•, )

slide-20
SLIDE 20

An Example: Two-user multiple access broadcast (cont.)

A Mahajan (McGill) Control sharing info struc 19

Literature overview

Symmeric arrivals: Hlyuchj and Gallager, 1981 feasible lower bound Symmeric arrivals: Ooi and Wornell, 1996 genie aided upper bound that numerically matched lower bound. Asymmetric arrivals: Used as benchmark problem in AI community (Hansen et al, 2004, Bernstein et al, 2005, Shez Charpillet, 2006) for numerical algorithms for DEC-POMDPs.

slide-21
SLIDE 21

An Example: Two-user multiple access broadcast (cont.)

A Mahajan (McGill) Control sharing info struc 20

Structure of optimal control policy

๐œŒ๔€Š

๔€• is equivalent to ๐œŒ๔€Š ๔€• =โˆถ ๔€Š ๔€• โˆˆ {, }

๔€Š

๔€• is equivalent to ๔€Š ๔€• =โˆถ ๔€Š ๔€• โˆˆ {, }

Structure of optimal policy ๔€Š

๔€• = ๔€Š ๔€• โ‹… ๔€Š ๔€•,

where ๔€ฃ

๔€• , ๔€ค ๔€• = โ„Ž๔€Š ๔€•๔€ฃ ๔€• , ๔€ค ๔€•

slide-22
SLIDE 22

An Example: Two-user multiple access broadcast (cont.)

A Mahajan (McGill) Control sharing info struc 21

Optimal policy for symmetric arrivals

Notation: for any โˆˆ [, ], let ๐ต = โˆ’ โˆ’ โˆ’ Characteristic polynomial: ๐œ’๔€ = + โˆ’ ๔€ค โˆ’ + โˆ’ ๔€๔€ฌ๔€ฃ. Let ๐›ฝ๔€ be the root of ๐œ’๔€ in [, ] and ๐œ be the root of = โˆ’ ๔€ค Optimal performance: ๐พ* = { โˆ’ โˆ’ ๔€ค, if ๐›ฝ๔€ฃ โˆ’ ๔€ค โˆ’ / + ๔€ค + ๔€ฅ,

  • therwise
slide-23
SLIDE 23

An Example: Two-user multiple access broadcast (cont.)

A Mahajan (McGill) Control sharing info struc 22

Optimal policy

When > ๐œ โ„Ž*๔€ฃ, ๔€ค = { , if ๔€ฃ > ๔€ค , if ๔€ฃ < ๔€ค , or , if ๔€ฃ = ๔€ค When < ๐œ, let ๐‘œ โˆˆ โ„• be such that ๐›ฝ๔€๔€ฌ๔€ฃ < ๐›ฝ๔€. โ„Ž*๔€ฃ, ๔€ค =

  • ,

if ๔€ฃ ๐ต๔€ and ๔€ค ๐ต๔€ , if ๔€ฃ > max๐ต๔€, ๔€ค , if ๔€ค > max๐ต๔€, ๔€ฃ , or , if ๔€ฃ = ๔€ค =

Analytic proof of optimality of the policy proposed by Hlyuchj and Gallager, 1981.

slide-24
SLIDE 24

A Mahajan (McGill) Control sharing info struc 23

Conclusion

Coupled subsystems with control-sharing

Non-classical information structure Use properties

  • f

the system dynamics and the common information approach of (Mahajan, Nayyar, Teneketzis 2008) to find structure of optimal controller and a dynamic programming decomposition. Allows using standard tools from stochastic control to analyze specific applications.

Key take-home points

Subclasses of decentralized control problems with signaling are solvable! Each step of the DP is a functional optimization problem.