identifying tractable decentralized control problems on
play

Identifying tractable decentralized control problems on the basis - PowerPoint PPT Presentation

Identifying tractable decentralized control problems on the basis of information structure Aditya Mahajan, Ashutosh Nayyar, and Demosthenis Teneketzis Dept of EECS, University of Michigan, Ann Arbor September 26, 2008 Optimal design of


  1. Identifying tractable decentralized control problems on the basis of information structure Aditya Mahajan, Ashutosh Nayyar, and Demosthenis Teneketzis Dept of EECS, University of Michigan, Ann Arbor September 26, 2008

  2. Optimal design of decentralized systems Our models encompass functions as control actions Main idea : viewed appropriately, these models are equivalent to POMDPs with Generic team model of Witsenhausen (1988) with non-classical information structures Standard form (Witsenhausen, 1973) obtain a sequential decomposition for their finite and infinite horizon cases. Results of this paper : Consider two general models of decentralized systems and Difficulties : Conceptual and computational Numerical solution can be obtained using existing techiques for POMDPs ◦ ◦ ◦ ⊲ k -step delay sharing pattern (Walrand and Varaiya, 1978) ⊲ ⊲ ◦ ◦

  3. Model A for two agents Agent 1 Agent 2 Plant X t × × Z 1 t U 1 t Y t × Z 2 t U 2 t

  4. Model A for two agents Observations Common message: Objective : Determine an optimal design Plant : X t + 1 = f t ( X t , U 1 t , U 2 t , W t ) ◦ ◦ Private message: Z 1 t = h 1 t ( X t , N 1 t ) ⊲ ⊲ Y t = c t ( X t , U 1 t − 1 , U 2 Z 2 t = h 2 t ( X t , U 1 t , N 2 t − 1 , Q t ) t ) Agent k ◦ t ( Yt, Z k Control: U k t = g k t , M k t − 1 ) ⊲ t ( Yt, Z k Memory update: M k t = l k t , M k t − 1 ) ⊲ ◦ Design ≡ all control and memory update functions of both agents � T � � � � ρ t ( X t , U 1 t , U 2 Cost at time t : ρ t ( X t , U 1 t , U 2 t ) . t ) Cost of a design : E � ◦ � � Design t = 1 ◦

  5. Model A for two agents Non-classical information structures Variables Plant Sequential system Control laws Agent 2 Agent 1 X t ◦ Salient features × ⊲ × Z 1 t ⊲ U 1 t Y t × Z 2 t U 2 t X t X t Q t Q t Y t Y t N 1 N 1 t Z 1 Z 1 t U 1 U 1 t M 1 M 1 N 2 N 2 t Z 2 Z 2 t U 2 U 2 t M 2 M 2 t t t t t t t t t t ( g 1 ( g 1 t , l 1 t , l 1 ( g 2 ( g 2 t , l 2 t , l 2 t ) t ) t ) t )

  6. Consider the model from the point of view of a fictitious common agent

  7. Common Agent Think of control and memory update functions in two steps Similarly, Common agent observes all common messages ◦ U k t = g k t ( Y t , Z k t , M k t − 1 ) g k t ( Z k t , M k g k t = γ k t ( Y t ) = ^ where ^ t − 1 ) , M k t = l k t ( Y t , Z k t , M k t − 1 ) = ^ where ^ l k t ( Z k t , M k l k t = λ k t ( Y t ) t − 1 ) ,

  8. Common Agent's viewpoint Agent 1 Agent 2 Plant X t × × Z 1 Y t t U 1 t × Z 2 t U 2 t

  9. Common Agent's viewpoint Agent 1 Agent 2 CA Plant X t × × Z 1 Y t t t , ^ g 1 l 1 (^ t ) U 1 t × Z 2 t t , ^ g 2 l 2 (^ t ) U 2 t

  10. Common Agent's viewpoint Variables actions of CA Control Info states Obs of CA a POMDP (partially observable Markov decision process) X t X t Q t Q t Y t Y t N 1 N 1 t Z 1 Z 1 t U 1 U 1 t M 1 M 1 N 2 N 2 t Z 2 Z 2 t U 2 U 2 t M 2 M 2 t t t t t t t t t t O 0 O 0 O 1 O 1 O 2 O 2 t t t t t t π 0 π 0 π 1 π 1 π 2 π 2 t t t t t t t , ^ t , ^ t , ^ t , ^ g 1 g 1 l 1 l 1 g 2 g 2 l 2 l 2 (^ (^ t ) t ) (^ (^ t ) t ) t 0 t 1 t 2 Consider three time steps t 0 , t 1 , and t 2 in time interval t ◦ S 0 t = ( X t , M 1 t − 1 , M 2 t − 1 , U 1 t − 1 , U 2 O 0 t − 1 ) , t = Y t S 1 t = ( X t , M 1 t − 1 , M 2 O 1 t − 1 ) , t = − S 2 t = ( X t , M 1 t , M 2 t − 1 , U 1 O 2 t ) , t = − t , ^ POMDP with: ⊲ State : S i ⊲ Obs : O i g k l k ⊲ Control actions : (^ t ) ◦ t , t , From the common agent's viewpoint { S 0 t , S 1 t , S 2 t , t = 1, . . . , T } is

  11. Sequential decomposition ◦ Information states � � l t − 1 � g 1,t − 1 , ^ g 2,t − 1 , ^ π 0 S 0 � Y t , ^ l 1,t − 1 , ^ t = Pr � t � � l t − 1 � g 1,t − 1 , ^ g 2,t − 1 , ^ π 1 S 1 � Y t , ^ l 1,t − 1 , ^ t = Pr � t � � l t − 1 � g 1,t , ^ g 2,t − 1 , ^ π 2 S 2 � Y t , ^ l 1,t , ^ t = Pr � t ◦ Optimality equations V 0 T + 1 ( π 0 T + 1 ) ≡ 0, for t = 1, . . . , T � π 0 V 0 t ( π 0 � V 1 t ( π 1 � � t ) = E t ) , t � π 1 V 1 t ( π 1 V 2 t ( π 2 � t , θ 1 � � � � t ) = min t ) E , t θ 1 t V 2 t ( π 2 ρ t ( X t , U 1 t , U 2 t ) + V 0 t + 1 ( π 0 � � π 2 t , θ 2 � � � � t ) = min E t + 1 ) , t θ 2 t t , ^ where θ k g k l k t = (^ t )

  12. Model A with no common messages Models considered in the paper Also consider infinite horizon problems ◦ Model A n -agent version of what was presented here ⊲ ◦ Model B ⊲ ◦

  13. Example — multiaccess broadcast Tx 1 Packet held in queue until successful transmission Queues with buffer size 1 Packet arrival is independent Bernoulli process Tx 2 Broadcast medium ◦ MAB Channel Single user transmits = ⊲ ⇒ successful transmission Both users transmit = ⊲ ⇒ packet collision ◦ Transmitters ⊲ ⊲ ⊲

  14. Example — multiaccess broadcast Both transmitters know if there was no transmission, successful transmission, or a collision If packet is available, decide whether or not to transmit based on all past channel feedback Avoid collisions Avoid idle ◦ Channel feedback ◦ Policy of transmitters ◦ Objective: Maximize throughput ⊲ ⊲

  15. History of multiaccess broadcast Ooi and Wornell, Consider the case of asymmetric arrival rates AI Literature Hluchyj and Gallager's scheme meets this upper bound Numerically find optimal performance of the relaxed problem Considered a relaxation of the problem “Decentralized control of multiple access broadcast channels”, CDC 96. Approximate heuristic solutions for small horizons Restricted attention to “window protocols” Considered symmetric arrival rates “Multiaccess of a slotted channel by finitely many users”, NTC 81. Hluchyj and Gallager, ◦ ⊲ ⊲ ◦ ⊲ ⊲ ⊲ ◦ ⊲ ⊲

  16. Multi-access broadcast is equivalent to Model A Agent 2 , Tx 1 Private messages Number of packets in each buffer Common message Channel feedback Equivalent to a POMDP with finite state and action spaces Tx 2 Tx 1 Tx 2 Agent 1 Broadcast medium ≡ ≡ ≡ ≡ � � Z 1 t , Z 2 � Z k Information state : π t = Pr t = { 0, 1 } ◦ � feedback t g k Action Space : ^ t : { 0, 1 } → { Tx, Don't Tx } ◦

  17. Tractability All system variables are finite valued All system variables take values in a time-invariant space The system is time-homogeneous Conclusions Sequential decomposition of two general models of decentralized systems Equivalent to POMDPs (sometimes to POMDPs with finite state and action spaces) Harder to solve than POMDPs due to expansion of state and action spaces. ◦ Finite horizon problem ⊲ ◦ Infinite horizon ⊲ ⊲ ◦ ◦ ◦

  18. Thank you

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend