a graphical model for sequential teams
play

A graphical model for sequential teams Aditya Mahajan and Sekhar - PowerPoint PPT Presentation

A graphical model for sequential teams Aditya Mahajan and Sekhar Tatikonda Dept of Electrical Engineering Yale University Presented at: ConCom Workshop, June 27, 2009 A glimpse of the result Structural results in sequential teams Example:


  1. A graphical model for sequential teams Aditya Mahajan and Sekhar Tatikonda Dept of Electrical Engineering Yale University Presented at: ConCom Workshop, June 27, 2009

  2. A glimpse of the result

  3. Structural results in sequential teams ◦ Example: MDP (Markov decision process) ⊲ Controlled MC: Pr ( x t | x 1 , . . ., x t − 1 , u 1 , . . ., u t − 1 ) = Pr ( x t | x t − 1 , u t − 1 ) ⊲ Controller: u t = g t ( x 1 , . . ., x t , u 1 , . . ., u t − 1 ) ⊲ Reward: r t = ρ t ( x t , u t ) � T � � ⊲ Objective: Maximize E R t t = 1 ◦ Structural results ⊲ Without loss of optimality, u t = g t ( x t )

  4. Graphically . . . original r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3

  5. Graphically . . . structural results r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3

  6. Hans S. Witsenhausen, On the structure of real-time source coders, Structural results in sequential teams Bell Systems Technical Journal, vol 58, no 6, pp 1437-1451, July-August 1979 ◦ Example: real-time source coding ⊲ Source: First order Markov source { x t , t = 1 , . . . } ⊲ Real-time source coder: y t = c t ( x 1 , . . ., x t , y 1 , . . ., y t − 1 ) x t = g t ( y t , m t − 1 ) ⊲ Finite memory decoder: ˆ m t = l t ( y t , m t − 1 ) ⊲ ⊲ Cost: d t = ρ t ( x t , ˆ x t ) ◦ Structural Results ⊲ Without loss of optimality, y t = c t ( x t , m t − 1 )

  7. Graphically . . . original ˆ ˆ ˆ d 1 d 2 d 3 ρ 1 ρ 2 ρ 3 f 1 f 2 f 3 x 1 y 1 m 1 x 2 y 2 m 2 x 3 y 3 x 1 x 2 x 3 c 1 g 1 c 2 g 2 c 3 g 3 l 1 l 2

  8. Graphically . . . structural results ˆ ˆ ˆ d 1 d 2 d 3 ρ 1 ρ 2 ρ 3 f 1 f 2 f 3 x 1 y 1 m 1 x 2 y 2 m 2 x 3 y 3 x 1 x 2 x 3 c 1 g 1 c 2 g 2 c 3 g 3 l 1 l 2

  9. The main idea ◦ Represent a sequential team as a directed graph ◦ Simplify the graph

  10. Sequential teams – Salient features A team is sequential if and only if there exists a partial order between the system variables. There is no loss of optimality in restricting attention to non-randomizing decision makers Data available at a DM can be ignored if it is independent of the future Variables functionally determined from the data available at a DM can be ◦ ◦ ◦ rewards conditioned on other data at the DM ◦ assumed to be observed at the DM.

  11. Graphical models – Salient features Any partial order gives rise to a DAG (Directed Acyclic Graph) A DAFG can be used to efficiently check for conditional independence using d-separation A DAFG can be used to efficiently check for conditional independence with ◦ ◦ ◦ deterministic nodes using D-separation

  12. Match between features of sequential teams and graphical models The rest is a matter of details . . .

  13. The model ◦ Components of a sequential team ⊲ A set N of indices of system variables { X n , n ∈ N } . Finite sets {X n , n ∈ N } of state spaces of X n − A ⊂ N , variables generated by DM − N \ A , variables generated by nature − R ⊂ N , reward variables ⊲ Information sets { I n , n ∈ N } , such that I n ⊆ { 1 , . . ., n } . I n = � i ∈ I n X i ⊲ F N \ A = { f n , n ∈ N \ A } , where f n is a conditional PMF X n given I n ⊲ Design: G A = { g n , n ∈ A } , where g n is a decision rule from I n to X n

  14. The model ◦ Probability measure induced by a design � � P G A ( X N ) = f n ( X n | I n ) I [ X n = g n ( I n )] n ∈ N \ A n ∈ A ◦ Optimization problem �� � , where the expectation is with respect to P G A . Minimize E X n n ∈ R

  15. Representation as a graphical model ˜ ◦ Directed Acyclic Factor Graph ◦ Nodes ⊲ Variable node n ≡ system variable X n n ≡ conditional PMF f n or decision rule g n ⊲ Factor node ◦ Edges n ) , for each n ∈ N and i ∈ I n ⊲ ( i , ˜ ⊲ ( ˜ n , n ) , for each n ∈ N ◦ Acyclic Graph ⊲ Sequential team ⇒ partial order on variable nodes ⇒ acyclic graph

  16. Graphical models – Terminology ◦ parents ( n ) ⊲ { m : m → n } ⊲ Parents of a control (factor) node = data observed by controller ◦ children ( n ) ⊲ { m : n → m } ⊲ Children of a control node = control action ◦ ancestors ( n ) ⊲ { m : ∃ directed path from m to n } ⊲ Ancestors of a control node = all nodes that affect the data observed ◦ descendants ( n ) ⊲ { m : ∃ directed path from n to m } ⊲ Descendants of a control node = all nodes affected by the control action

  17. Graphical Models — Example r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3

  18. Graphical Models — Variable nodes Reward nodes Non-reward nodes r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3

  19. Graphical Models — Factor nodes Control Factors Stochastic Factors r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3

  20. Graphical Models — Parents and Children Parents Children Control factor node r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3

  21. Graphical Models — Ancestors and descendents Ancestors Descendants Control factor node r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3

  22. Structural results If some data available at a DM is independent of future rewards given the control action and other data at the DM, then that data can be ignored Can we automate this process? ◦ The main idea

  23. test conditional independence Graphical models can easily Struct. result ≡ cond. independence

  24. Conditional independence Explanation Hidden cause Markov chain y ◦ Three canonical graphs to verify x ⊥ ⊥ z | y x z y g y g x z x z f f f ◦ Blocking of a trail A trail from a to b is blocked by C if ∃ a node v on the trail such that either: ◦ either → v → , ← v ← , or ← v → , and v ∈ C ◦ → v ← and neither v nor any of v 's descendants are in C .

  25. Conditional independence ◦ d-separation A is d-separated from B by C if all trails from A to B are blocked by C ◦ Conditional independence For any probability measure P that factorizes according to a DAFG, A d-separated from B by C implies X A is conditionally independent of X B given X C , P a.s. ◦ Efficient algorithms to verify d-separation ⊲ Moral graph ⊲ Bayes Ball

  26. Automated Structural results ◦ First attempt ⊲ Dependent rewards: R d ( ˜ n ) = R ∩ descendants ( ˜ n ) n , and parent i is irrelevant if R d ( ˜ n ) is ⊲ Irrelevant data: At a control node ˜ d-separate from i given parents ( ˜ n ) ∪ children ( ˜ n ) \ { i } ⊲ Requisite data: All parents that are not irrelevant ◦ Structural result ⊲ Without loss of optimality, we can remove irrelevant data. u n = g n ( requisite ( ˜ n ))

  27. Structural Results for MDP — Step 1 r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3

  28. Structural Results for MDP — Step 1 r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3 ◦ Pick node g 3 . ⊲ Original u 3 = g 3 ( x 1 , x 2 , x 3 , u 1 , u 2 ) ⊲ requisite( g 3 ) = { x 3 } ⊲ Thus, u 3 = g 3 ( x 3 )

  29. Structural Results for MDP — Step 2 r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3

  30. Structural Results for MDP — Step 2 r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3 ◦ Pick node g 2 . ⊲ Original u 2 = g 2 ( x 1 , x 2 , u 1 ) ⊲ requisite( g 2 ) = { x 2 } ⊲ Thus, u 2 = g 2 ( x 2 )

  31. Structural Results for MDP — Simplified r 1 r 2 r 3 ρ 1 ρ 2 ρ 3 f 0 f 1 f 2 x 1 u 1 x 2 u 2 x 3 u 3 g 1 g 2 g 3

  32. Does not work for all problems . . . even when structural simplification is possible u n = g n ( requisite ( ˜ n ) )

  33. A real-time source coding problem Hans S. Witsenhausen, On the structure of real-time source coders, Bell Systems Technical Journal, vol 58, no 6, pp 1437-1451, July-August 1979 ◦ Mathematical Model ⊲ Source: First order Markov source { x t , t = 1 , . . . } ⊲ Real-time source coder: y t = c t ( x ( 1 : t ), y ( 1 : t − 1 )) x t = g t ( y t , m t − 1 ) ⊲ Finite memory decoder: ˆ m t = l t ( y t , m t − 1 ) ⊲ ⊲ Cost: d t = ρ t ( x t , ˆ x t )

  34. Model for real-time comm — Does not simplify ˆ ˆ ˆ d 1 d 2 d 3 ρ 1 ρ 2 ρ 3 f 1 f 2 f 3 x 1 y 1 m 1 x 2 y 2 m 2 x 3 y 3 x 1 x 2 x 3 c 1 g 1 c 2 g 2 c 3 g 3 l 1 l 2

  35. Need to take care of deterministic variables!

  36. Functionally determined nodes ◦ Functionally determined ⊲ X B is functionally determined by X A if X B ⊥ ⊥ X N | X A ◦ Conditional independence with functionally determined nodes ⊲ Can be checked using D-separation ⊲ Similar to d-sep: in the defn of blocking change “in C ” by “is func detm by C ” ◦ Blocking of a trail (version that takes care of detm nodes) A trail from a to b is blocked by C if ∃ a node v on the trail such that either: ◦ either → v → , ← v ← , or ← v → , and v is functionally determined by C ◦ → v ← and neither v nor any of v 's descendants are in C .

  37. appropriate functionally determined data Automated Structural results ◦ Second attempt ⊲ Irrelevant data: Change d-separation by D-separation ⊲ Requisite data: All parents that are not irrelevant ◦ Structural result ⊲ Without loss of optimality, we can remove irrelevant data and add u n = g n ( requisite ( ˜ n ) , functionally_detm( ˜ n ) ∩ ancestors( R d ( ˜ n ) ) )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend