Learning chordal Markov networks by dynamic programming Kustaa - - PowerPoint PPT Presentation

learning chordal markov networks by dynamic programming
SMART_READER_LITE
LIVE PREVIEW

Learning chordal Markov networks by dynamic programming Kustaa - - PowerPoint PPT Presentation

Learning chordal Markov networks by dynamic programming Kustaa Kangas Teppo Niinim aki Mikko Koivisto NIPS 2014 (to appear) November 27, 2014 Kustaa Kangas Learning chordal Markov networks by dynamic programming Probabilistic graphical


slide-1
SLIDE 1

Learning chordal Markov networks by dynamic programming

Kustaa Kangas Teppo Niinim¨ aki Mikko Koivisto NIPS 2014 (to appear) November 27, 2014

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-2
SLIDE 2

Probabilistic graphical models

Graphical model

◮ Graph structure G on the vertex set V = {1, . . . , n} ◮ Represents conditional independencies in a joint distribution

p(X) = p(X1, . . . , Xn) Advantages

◮ Easy to read ◮ Compact way to store a distribution ◮ Efficient inference

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-3
SLIDE 3

Probabilistic graphical models

Directed models: Bayesian networks, ... Undirected models: Markov networks, ... Structure learning problem: Given samples from p(X1, . . . , Xn), find a model that best fits the sampled data.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-4
SLIDE 4

Probabilistic graphical models

Structure learning in chordal Markov networks: Find a chordal Markov network that maximizes a given decomposable score. Prior work:

◮ Constraint satisfaction, Corander et al. ◮ Integer linear programming, Bartlett and Cussens

Our result: Dynamic programming in O(4n) time and O(3n) space for n variables.

◮ First non-trivial bound ◮ Competitive in practice

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-5
SLIDE 5

Markov networks

◮ Joint distribution p(X) = p(X1, . . . Xn) ◮ Undirected graph G on V = {1, . . . , n} with the Global

Markov property: For A, B, S ⊆ V it holds that XA ⊥ ⊥ XB | XS if S separates A and B in G.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-6
SLIDE 6

Markov networks

◮ Joint distribution p(X) = p(X1, . . . Xn) ◮ Undirected graph G on V = {1, . . . , n} with the Global

Markov property: For A, B, S ⊆ V it holds that XA ⊥ ⊥ XB | XS if S separates A and B in G.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-7
SLIDE 7

Markov networks

◮ Joint distribution p(X) = p(X1, . . . Xn) ◮ Undirected graph G on V = {1, . . . , n} with the Global

Markov property: For A, B, S ⊆ V it holds that XA ⊥ ⊥ XB | XS if S separates A and B in G.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-8
SLIDE 8

Markov networks

◮ Joint distribution p(X) = p(X1, . . . Xn) ◮ Undirected graph G on V = {1, . . . , n} with the Global

Markov property: For A, B, S ⊆ V it holds that XA ⊥ ⊥ XB | XS if S separates A and B in G.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-9
SLIDE 9

Markov networks

If p is strictly positive, it factorizes as p(X1, . . . , Xn) =

  • C∈C

ψC(XC) , where

◮ C is the set of (maximal) cliques of G ◮ ψC are mappings to positive reals ◮ XC = {Xv : v ∈ C}

(Hammersley–Clifford Theorem)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-10
SLIDE 10

Bayesian networks

◮ Directed acyclic graph ◮ Conditional independencies by d-separation ◮ Factorizes:

p(X1, . . . , Xn) =

n

  • i=1

p(Xi | parents(Xi))

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-11
SLIDE 11

Bayesian and Markov networks

◮ Bayesian and Markov networks are not equivalent ◮ Chordal Markov networks are the intersection between the two

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-12
SLIDE 12

Chordal graphs

◮ A chord is an edge between two non-consecutive vertices in a

cycle.

◮ An graph is chordal or triangulated if every cycle of at least 4

vertices has a chord.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-13
SLIDE 13

Chordal graphs

◮ A chord is an edge between two non-consecutive vertices in a

cycle.

◮ An graph is chordal or triangulated if every cycle of at least 4

vertices has a chord.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-14
SLIDE 14

Clique tree decomposition

1 3 4 2 5 6 7 8 9

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-15
SLIDE 15

Clique tree decomposition

1 3 4 2 5 6 7 8 9

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-16
SLIDE 16

Clique tree decomposition

3 5 6 8 9 1 2 7 1 3 4 2 2 8 Running intersection property: For all C1, C2 ∈ C, every clique

  • n the path between C1 and C2 contains C1 ∩ C2.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-17
SLIDE 17

Clique tree decomposition

3 5 6 8 9 1 2 7 1 3 4 2 2 8 Running intersection property: For all C1, C2 ∈ C, every clique

  • n the path between C1 and C2 contains C1 ∩ C2.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-18
SLIDE 18

Clique tree decomposition

3 5 6 8 9 1 2 7 1 3 4 2 2 8 Running intersection property: For all C1, C2 ∈ C, every clique

  • n the path between C1 and C2 contains C1 ∩ C2.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-19
SLIDE 19

Clique tree decomposition

3 5 6 8 9 1 2 7 1 3 4 2 2 8 Separator: Intersection of adjacent cliques in a clique tree. Every clique tree has the same multiset of separators.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-20
SLIDE 20

Clique tree decomposition

1 3 4 2 5 6 7 8 9

3 5 6 8 9 1 2 7 1 3 4 2 2 8

Theorem: A graph is chordal if and only if it has a clique tree.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-21
SLIDE 21

Chordal Markov networks

1 3 4 2 5 6 7 8 9

◮ ψi(XCi) = p(Ci)/p(Si) ◮ Factorization becomes

p(X1, . . . , Xn) =

  • C∈C

ψC(XC) =

  • C∈C p(XC)
  • S∈S p(XS) ,

where C and S are the sets of cliques and separators.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-22
SLIDE 22

Structure learning

Given sampled data D from p(X1, . . . Xn), how well does a graph structure G fit the data? Common scoring criteria decompose as score(G) =

  • C∈C score(C)
  • S∈S score(S)

Each score(C) is the probability of the data projected to C, possibly extended with a prior or penalization term. e.g. maximum likelihood, Bayesian Dirichlet, ...

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-23
SLIDE 23

Structure learning

Structure learning problem in chordal Markov networks: Given score(C) for each C ⊆ V , find a chordal graph G that maximizes score(G) =

  • C∈C score(C)
  • S∈S score(S) .

We assume each score(C) can be efficiently computed and focus

  • n the combinatorial problem.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-24
SLIDE 24

Structure learning

Bruteforce solution:

◮ Enumerate undirected graphs ◮ Determine which are chordal ◮ For each chordal G, find a clique tree to evaluate score(G) ◮ O∗(2(n

2)) Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-25
SLIDE 25

Structure learning

We denote score(T) = score(G) when T is a clique tree of G.

◮ Every clique tree T uniquely specifies a chordal graph G. ◮ We can search the space of clique trees instead.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-26
SLIDE 26

Recursive characterization

3 5 6 8 9 1 2 7 1 3 4 2 2 8

Let T be rooted at C with subtrees T1, . . . , Tk rooted at C1, . . . , Ck. Then, score(T) = score(C)

k

  • i=1

score(Ti) score(C ∩ Ci)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-27
SLIDE 27

Recurrence

For S ⊂ V and ∅ ⊂ R ⊆ V \ S, let f (S, R) be the maximum score(G) over chordal G on S ∪ R such that S is a proper subset of a clique. Then, the solution is given by f (∅, V ).

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-28
SLIDE 28

Recurrence

For S ⊂ V and ∅ ⊂ R ⊆ V \ S, let f (S, R) be the maximum score(G) over chordal G on S ∪ R such that S is a proper subset of a clique. Then, the solution is given by f (∅, V ). f (S, R) = max

S ⊂ C ⊆ S ∪ R {R1, . . . , Rk} ❁ R \ C S1, . . . , Sk ⊂ C

score(C)

k

  • i=1

f (Si, Ri) score(Si) .

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-29
SLIDE 29

Recurrence

score(T) = score(C)

k

  • i=1

score(Ti) score(C ∩ Ci) f (S, R) = max

S ⊂ C ⊆ S ∪ R {R1, . . . , Rk} ❁ R \ C S1, . . . , Sk ⊂ C

score(C)

k

  • i=1

f (Si, Ri) score(Si)

C S

R

R1 R2 R3 C

C S R

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-30
SLIDE 30

Recurrence

score(T) = score(C)

k

  • i=1

score(Ti) score(C ∩ Ci) f (R) = max

∅ ⊂ C ⊆ R {R1, . . . , Rk} ❁ R \ C S1, . . . , Sk ⊂ C

score(C)

k

  • i=1

f (Si ∪ Ri) score(Si)

C S

R

R1 R2 R3 C

C S R

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-31
SLIDE 31

Recurrence

f (S, R) = max

S ⊂ C ⊆ S ∪ R {R1, . . . , Rk} ❁ R \ C S1, . . . , Sk ⊂ C

score(C)

k

  • i=1

f (Si, Ri) score(Si)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-32
SLIDE 32

Recurrence

f (S, R) = max

S ⊂ C ⊆ S ∪ R {R1, . . . , Rk} ❁ R \ C

score(C)

k

  • i=1

max

Si⊂C

f (Si, Ri) score(Si)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-33
SLIDE 33

Recurrence

f (S, R) = max

S ⊂ C ⊆ S ∪ R {R1, . . . , Rk} ❁ R \ C

score(C)

k

  • i=1

max

Si⊂C

f (Si, Ri) score(Si) h(C, R) = max

S⊂C

f (S, R) score(S)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-34
SLIDE 34

Recurrence

f (S, R) = max

S ⊂ C ⊆ S ∪ R {R1, . . . , Rk} ❁ R \ C

score(C)

k

  • i=1

h(C, Ri) h(C, R) = max

S⊂C

f (S, R) score(S)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-35
SLIDE 35

Recurrence

f (S, R) = max

S ⊂ C ⊆ S ∪ R {R1, . . . , Rk} ❁ R \ C

score(C)

k

  • i=1

h(C, Ri)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-36
SLIDE 36

Recurrence

f (S, R) = max

S⊂C⊆S∪R score(C)

max

{R1,...,Rk}❁R\C k

  • i=1

h(C, Ri)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-37
SLIDE 37

Recurrence

f (S, R) = max

S⊂C⊆S∪R score(C)

max

{R1,...,Rk}❁R\C k

  • i=1

h(C, Ri) g(C, U) = max

{R1,...,Rk}❁U k

  • i=1

h(C, Ri)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-38
SLIDE 38

Recurrence

f (S, R) = max

S⊂C⊆S∪R score(C)g(C, R \ C)

g(C, U) = max

{R1,...,Rk}❁U k

  • i=1

h(C, Ri)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-39
SLIDE 39

Recurrence

g(C, U) = max

{R1,...,Rk}❁U k

  • i=1

h(C, Ri)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-40
SLIDE 40

Recurrence

g(C, U) = max

{R1,...,Rk}❁U k

  • i=1

h(C, Ri) If U = ∅, then g(C, U) = 1 (empty product).

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-41
SLIDE 41

Recurrence

g(C, U) = max

{R1,...,Rk}❁U k

  • i=1

h(C, Ri) If U = ∅, then g(C, U) = 1 (empty product). Otherwise g(C, U) = max

{R1,...,Rk}❁U k

  • i=1

h(C, Ri)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-42
SLIDE 42

Recurrence

g(C, U) = max

{R1,...,Rk}❁U k

  • i=1

h(C, Ri) If U = ∅, then g(C, U) = 1 (empty product). Otherwise g(C, U) = max

∅=R1⊆U

max

{R2,...,Rk}❁U\R1 k

  • i=1

h(C, Ri)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-43
SLIDE 43

Recurrence

g(C, U) = max

{R1,...,Rk}❁U k

  • i=1

h(C, Ri) If U = ∅, then g(C, U) = 1 (empty product). Otherwise g(C, U) = max

∅=R1⊆U h(C, R1)

max

{R2,...,Rk}❁U\R1 k

  • i=2

h(C, Ri)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-44
SLIDE 44

Recurrence

g(C, U) = max

{R1,...,Rk}❁U k

  • i=1

h(C, Ri) If U = ∅, then g(C, U) = 1 (empty product). Otherwise g(C, U) = max

∅=R1⊆U h(C, R1)g(C, U \ R1)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-45
SLIDE 45

Recurrence

g(C, U) = max

{R1,...,Rk}❁U k

  • i=1

h(C, Ri) If U = ∅, then g(C, U) = 1 (empty product). Otherwise g(C, U) = max

∅=R⊆U h(C, R)g(C, U \ R)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-46
SLIDE 46

Recurrence

We have the split into three simpler recurrences: f (S, R) = max

S⊂C⊆S∪R score(C)g(C, R \ C)

g(C, U) = max

∅⊂R⊆U h(C, R)g(C, U \ R)

h(C, R) = max

S⊂C f (S, R)

  • score(S)

Dynamic programming in the increasing order of set size. Space: O(3n) Time: O(4n)

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-47
SLIDE 47

Efficient indexing

For each pair (A, B) compute the index

n

  • v=1

3v−1 · Iv(A, B) where Iv(A, B) =    1 if v ∈ A, 2 if v ∈ B, 0 otherwise.

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-48
SLIDE 48

Experiments

w = 3 w = 4 w = 5 w = 6

8 10 12 14 16 18 1s 60s 1h Junctor, any GOBNILP, large GOBNILP, medium GOBNILP, small 8 10 12 14 16 18 1s 60s 1h 8 10 12 14 16 18 1s 60s 1h 8 10 12 14 16 18 1s 60s 1h 8 10 12 14 16 18 1s 60s 1h 8 10 12 14 16 18 1s 60s 1h 8 10 12 14 16 18 1s 60s 1h 8 10 12 14 16 18 1s 60s 1h

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-49
SLIDE 49

Experiments

Dataset Abbr. n m Tic-tac-toe X 10 958 Poker P 11 10000 Bridges B 12 108 Flare F 13 1066 Zoo Z 17 101 Dataset Abbr. n m Voting V 17 435 Tumor T 18 339 Lymph L 19 148 Hypothyroid 22 3772 Mushroom 22 8124

w = 3 w = 4 w = 5 w = 6

1s 60s 1h Junctor 1s 60s 1h GOBNILP B F L P X T V Z 1s 60s 1h Junctor 1s 60s 1h GOBNILP B F L P X T V Z 1s 60s 1h Junctor 1s 60s 1h GOBNILP B F L P X T V Z 1s 60s 1h Junctor 1s 60s 1h GOBNILP B F L P X T V Z

Kustaa Kangas Learning chordal Markov networks by dynamic programming

slide-50
SLIDE 50

Thank you!

Kustaa Kangas Learning chordal Markov networks by dynamic programming