A graphical model for sequential teams Aditya Mahajan and Sekhar - - PowerPoint PPT Presentation

a graphical model for sequential teams
SMART_READER_LITE
LIVE PREVIEW

A graphical model for sequential teams Aditya Mahajan and Sekhar - - PowerPoint PPT Presentation

A graphical model for sequential teams Aditya Mahajan and Sekhar Tatikonda Dept of Electrical Engineering Yale University Presented at: ConCom Workshop, June 27, 2009 A glimpse of the result Structural results in sequential teams Example:


slide-1
SLIDE 1

A graphical model for sequential teams

Aditya Mahajan and Sekhar Tatikonda Dept of Electrical Engineering Yale University Presented at: ConCom Workshop, June 27, 2009

slide-2
SLIDE 2

A glimpse of the result

slide-3
SLIDE 3

Structural results in sequential teams

  • Example: MDP (Markov decision process)

⊲ Controlled MC: Pr (xt | x1, . . ., xt−1, u1, . . ., ut−1) = Pr (xt | xt−1, ut−1) ⊲ Controller: ut = gt(x1, . . ., xt,u1, . . ., ut−1) ⊲ Reward: rt = ρt(xt,ut) ⊲ Objective: Maximize E T

  • t=1

Rt

  • Structural results

⊲ Without loss of optimality, ut = gt(xt)

slide-4
SLIDE 4

Graphically . . . original r1 r2 r3 f0 ρ1 f1 ρ2 f2 ρ3 x1 u1 x2 u2 x3 u3 g1 g2 g3

slide-5
SLIDE 5

Graphically . . . structural results r1 r2 r3 f0 ρ1 f1 ρ2 f2 ρ3 x1 u1 x2 u2 x3 u3 g1 g2 g3

slide-6
SLIDE 6

Structural results in sequential teams

  • Example: real-time source coding

⊲ Source: First order Markov source {xt, t = 1, . . .} ⊲ Real-time source coder: yt = ct(x1, . . ., xt,y1, . . ., yt−1) ⊲ Finite memory decoder: ˆ xt = gt(yt,mt−1) ⊲ mt = lt(yt,mt−1) ⊲ Cost: dt = ρt(xt, ˆ xt) Hans S. Witsenhausen, On the structure of real-time source coders, Bell Systems Technical Journal, vol 58, no 6, pp 1437-1451, July-August 1979

  • Structural Results

⊲ Without loss of optimality, yt = ct(xt,mt−1)

slide-7
SLIDE 7

Graphically . . . original

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-8
SLIDE 8

Graphically . . . structural results

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-9
SLIDE 9

The main idea

  • Represent a sequential team as a directed graph
  • Simplify the graph
slide-10
SLIDE 10

Sequential teams – Salient features

  • A team is sequential if and only if there exists a partial order between the

system variables.

  • There is no loss of optimality in restricting attention to non-randomizing

decision makers

  • Data available at a DM can be ignored if it is independent of the future

rewards conditioned on other data at the DM

  • Variables functionally determined from the data available at a DM can be

assumed to be observed at the DM.

slide-11
SLIDE 11

Graphical models – Salient features

  • Any partial order gives rise to a DAG (Directed Acyclic Graph)
  • A DAFG can be used to efficiently check for conditional independence using

d-separation

  • A DAFG can be used to efficiently check for conditional independence with

deterministic nodes using D-separation

slide-12
SLIDE 12

Match between features of sequential teams and graphical models The rest is a matter of details . . .

slide-13
SLIDE 13

The model

  • Components of a sequential team

⊲ A set N of indices of system variables {Xn, n ∈ N}. Finite sets {Xn, n ∈ N} of state spaces of Xn − A ⊂ N, variables generated by DM − N \ A, variables generated by nature − R ⊂ N, reward variables ⊲ Information sets {In, n ∈ N}, such that In ⊆ {1, . . ., n}. In =

i∈In Xi

⊲ FN\A = {fn, n ∈ N \ A}, where fn is a conditional PMF Xn given In ⊲ Design: GA = {gn, n ∈ A}, where gn is a decision rule from In to Xn

slide-14
SLIDE 14

The model

  • Probability measure induced by a design

PGA(XN) =

  • n∈N\A

fn(Xn|In)

  • n∈A

I [Xn = gn(In)]

  • Optimization problem

Minimize E

  • n∈R

Xn

  • , where the expectation is with respect to PGA.
slide-15
SLIDE 15

Representation as a graphical model

  • Directed Acyclic Factor Graph
  • Nodes

⊲ Variable node n ≡ system variable Xn ⊲ Factor node ˜ n ≡ conditional PMF fn or decision rule gn

  • Edges

⊲ (i, ˜ n), for each n ∈ N and i ∈ In ⊲ ( ˜ n, n), for each n ∈ N

  • Acyclic Graph

⊲ Sequential team ⇒ partial order on variable nodes ⇒ acyclic graph

slide-16
SLIDE 16

Graphical models – Terminology

  • parents(n)

⊲ {m : m → n} ⊲ Parents of a control (factor) node = data observed by controller

  • children(n)

⊲ {m : n → m} ⊲ Children of a control node = control action

  • ancestors(n)

⊲ {m : ∃ directed path from m to n} ⊲ Ancestors of a control node = all nodes that affect the data observed

  • descendants(n)

⊲ {m : ∃ directed path from n to m} ⊲ Descendants of a control node = all nodes affected by the control action

slide-17
SLIDE 17

Graphical Models — Example r1 r2 r3 f0 ρ1 f1 ρ2 f2 ρ3 x1 u1 x2 u2 x3 u3 g1 g2 g3

slide-18
SLIDE 18

Graphical Models — Variable nodes r1 r2 r3 f0 ρ1 f1 ρ2 f2 ρ3 x1 u1 x2 u2 x3 u3 g1 g2 g3

Reward nodes Non-reward nodes

slide-19
SLIDE 19

Graphical Models — Factor nodes r1 r2 r3 f0 ρ1 f1 ρ2 f2 ρ3 x1 u1 x2 u2 x3 u3 g1 g2 g3

Control Factors Stochastic Factors

slide-20
SLIDE 20

Graphical Models — Parents and Children r1 r2 r3 f0 ρ1 f1 ρ2 f2 ρ3 x1 u1 x2 u2 x3 u3 g1 g2 g3

Parents Children Control factor node

slide-21
SLIDE 21

Graphical Models — Ancestors and descendents r1 r2 r3 f0 ρ1 f1 ρ2 f2 ρ3 x1 u1 x2 u2 x3 u3 g1 g2 g3

Ancestors Descendants Control factor node

slide-22
SLIDE 22

Structural results

  • The main idea

If some data available at a DM is independent of future rewards given the control action and other data at the DM, then that data can be ignored Can we automate this process?

slide-23
SLIDE 23
  • Struct. result ≡ cond. independence

Graphical models can easily test conditional independence

slide-24
SLIDE 24

Conditional independence

  • Three canonical graphs to verify x ⊥

⊥ z | y

x f y g z x f y g z x f y z

Markov chain Hidden cause Explanation

  • Blocking of a trail

A trail from a to b is blocked by C if ∃ a node v on the trail such that either:

  • either → v →, ← v ←, or ← v →, and v ∈ C
  • → v ← and neither v nor any of v's descendants are in C.
slide-25
SLIDE 25

Conditional independence

  • d-separation

A is d-separated from B by C if all trails from A to B are blocked by C

  • Conditional independence

For any probability measure P that factorizes according to a DAFG, A d-separated from B by C implies XA is conditionally independent of XB given XC, P a.s.

  • Efficient algorithms to verify d-separation

⊲ Moral graph ⊲ Bayes Ball

slide-26
SLIDE 26

Automated Structural results

  • First attempt

⊲ Dependent rewards: Rd( ˜ n) = R ∩ descendants( ˜ n) ⊲ Irrelevant data: At a control node ˜ n, and parent i is irrelevant if Rd( ˜ n) is d-separate from i given parents( ˜ n) ∪ children( ˜ n) \ {i} ⊲ Requisite data: All parents that are not irrelevant

  • Structural result

⊲ Without loss of optimality, we can remove irrelevant data. un = gn(requisite( ˜ n))

slide-27
SLIDE 27

Structural Results for MDP — Step 1

r1 r2 r3 f0 ρ1 f1 ρ2 f2 ρ3 x1 u1 x2 u2 x3 u3 g1 g2 g3

slide-28
SLIDE 28

Structural Results for MDP — Step 1

r1 r2 r3 f0 ρ1 f1 ρ2 f2 ρ3 x1 u1 x2 u2 x3 u3 g1 g2 g3

  • Pick node g3.

⊲ Original u3 = g3(x1,x2,x3,u1,u2) ⊲ requisite(g3) = {x3} ⊲ Thus, u3 = g3(x3)

slide-29
SLIDE 29

Structural Results for MDP — Step 2

r1 r2 r3 f0 ρ1 f1 ρ2 f2 ρ3 x1 u1 x2 u2 x3 u3 g1 g2 g3

slide-30
SLIDE 30

Structural Results for MDP — Step 2

r1 r2 r3 f0 ρ1 f1 ρ2 f2 ρ3 x1 u1 x2 u2 x3 u3 g1 g2 g3

  • Pick node g2.

⊲ Original u2 = g2(x1,x2,u1) ⊲ requisite(g2) = {x2} ⊲ Thus, u2 = g2(x2)

slide-31
SLIDE 31

Structural Results for MDP — Simplified

r1 r2 r3 f0 ρ1 f1 ρ2 f2 ρ3 x1 u1 x2 u2 x3 u3 g1 g2 g3

slide-32
SLIDE 32

un = gn(requisite( ˜ n)) Does not work for all problems . . . even when structural simplification is possible

slide-33
SLIDE 33

A real-time source coding problem

Hans S. Witsenhausen, On the structure of real-time source coders, Bell Systems Technical Journal, vol 58, no 6, pp 1437-1451, July-August 1979

  • Mathematical Model

⊲ Source: First order Markov source {xt, t = 1, . . .} ⊲ Real-time source coder: yt = ct(x(1:t), y(1:t − 1)) ⊲ Finite memory decoder: ˆ xt = gt(yt,mt−1) ⊲ mt = lt(yt,mt−1) ⊲ Cost: dt = ρt(xt, ˆ xt)

slide-34
SLIDE 34

Model for real-time comm — Does not simplify

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-35
SLIDE 35

Need to take care of deterministic variables!

slide-36
SLIDE 36

Functionally determined nodes

  • Functionally determined

⊲ XB is functionally determined by XA if XB ⊥ ⊥ XN | XA

  • Conditional independence with functionally determined nodes

⊲ Can be checked using D-separation ⊲ Similar to d-sep: in the defn of blocking change “in C” by “is func detm by C”

  • Blocking of a trail (version that takes care of detm nodes)

A trail from a to b is blocked by C if ∃ a node v on the trail such that either:

  • either → v →, ← v ←, or ← v →, and v is functionally determined by C
  • → v ← and neither v nor any of v's descendants are in C.
slide-37
SLIDE 37

Automated Structural results

  • Second attempt

⊲ Irrelevant data: Change d-separation by D-separation ⊲ Requisite data: All parents that are not irrelevant

  • Structural result

⊲ Without loss of optimality, we can remove irrelevant data and add appropriate functionally determined data un = gn(requisite( ˜ n), functionally_detm( ˜ n) ∩ ancestors(Rd( ˜ n)))

slide-38
SLIDE 38

Lets try this!

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-39
SLIDE 39

Structural Results for Dec MDP — Step 1

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-40
SLIDE 40

Structural Results for Dec MDP — Step 2

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-41
SLIDE 41

Structural Results for Dec MDP — Step 3

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-42
SLIDE 42

Structural Results for Dec MDP — Step 4

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-43
SLIDE 43

Structural Results for Dec MDP — Step 5

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-44
SLIDE 44

Structural Results for Dec MDP — Step 6

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-45
SLIDE 45

Structural Results for Dec MDP — Step 7

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-46
SLIDE 46

Structural Results for Dec MDP — Step 8

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-47
SLIDE 47

Structural Results for Dec MDP — Step 9

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-48
SLIDE 48

Structural Results for Dec MDP — Step 10

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-49
SLIDE 49

Structural Results for Dec MDP — Step 11

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-50
SLIDE 50

Structural Results for Dec MDP — Step 12

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-51
SLIDE 51

Structural Results for Dec MDP — Step 13

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-52
SLIDE 52

Structural Results for Dec MDP — Step 14

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

slide-53
SLIDE 53

Structural Results for real-time communication

  • Graphically

d1 d2 d3 f1 ρ1 f2 ρ2 f3 ρ3 x1 y1 ˆ x1 m1 x2 y2 ˆ x2 m2 x3 y3 ˆ x3 c1 g1 l1 c2 g2 l2 c3 g3

  • Mathematically

⊲ Original Encoder: yt = ct(x1, . . ., xt,y1, . . ., yt−1) ⊲ New encoder: yt = ct(xt,mt−1)

slide-54
SLIDE 54

Automated Structural results

  • Simplify Once

⊲ For each control node − Find irrelevant nodes and functionally determined nodes. − Remove edges from irrelevant nodes, add edges from functionally determined nodes.

  • Find fixed point

⊲ Keep on simplifying until the graph does not change

  • Software Implementation

⊲ A EDSL to find structural results http://pantheon.yale.edu/~am894/code/teams/

slide-55
SLIDE 55

Conclusion

slide-56
SLIDE 56

Conclusion

An automated method to derive structural results for sequential teams

  • Future Directions

⊲ Belief States ⊲ Sequential decomposition

slide-57
SLIDE 57

Thank you