Learning chordal Markov networks by dynamic programming Kustaa - PowerPoint PPT Presentation

Learning chordal Markov networks by dynamic programming Kustaa Kangas Teppo Niinim¨ aki Mikko Koivisto NIPS 2014 (to appear) November 27, 2014 Kustaa Kangas Learning chordal Markov networks by dynamic programming

Probabilistic graphical models Graphical model ◮ Graph structure G on the vertex set V = { 1 , . . . , n } ◮ Represents conditional independencies in a joint distribution p ( X ) = p ( X 1 , . . . , X n ) Advantages ◮ Easy to read ◮ Compact way to store a distribution ◮ Efficient inference Kustaa Kangas Learning chordal Markov networks by dynamic programming

Probabilistic graphical models Directed models: Bayesian networks, ... Undirected models: Markov networks, ... Structure learning problem : Given samples from p ( X 1 , . . . , X n ), find a model that best fits the sampled data. Kustaa Kangas Learning chordal Markov networks by dynamic programming

Probabilistic graphical models Structure learning in chordal Markov networks : Find a chordal Markov network that maximizes a given decomposable score. Prior work: ◮ Constraint satisfaction, Corander et al. ◮ Integer linear programming, Bartlett and Cussens Our result: Dynamic programming in O (4 n ) time and O (3 n ) space for n variables. ◮ First non-trivial bound ◮ Competitive in practice Kustaa Kangas Learning chordal Markov networks by dynamic programming

Markov networks ◮ Joint distribution p ( X ) = p ( X 1 , . . . X n ) ◮ Undirected graph G on V = { 1 , . . . , n } with the Global Markov property: For A , B , S ⊆ V it holds that X A ⊥ ⊥ X B | X S if S separates A and B in G . Kustaa Kangas Learning chordal Markov networks by dynamic programming

Markov networks If p is strictly positive, it factorizes as � p ( X 1 , . . . , X n ) = ψ C ( X C ) , C ∈ C where ◮ C is the set of (maximal) cliques of G ◮ ψ C are mappings to positive reals ◮ X C = { X v : v ∈ C } (Hammersley–Clifford Theorem) Kustaa Kangas Learning chordal Markov networks by dynamic programming

Bayesian networks ◮ Directed acyclic graph ◮ Conditional independencies by d-separation ◮ Factorizes: n � p ( X 1 , . . . , X n ) = p ( X i | parents ( X i )) i =1 Kustaa Kangas Learning chordal Markov networks by dynamic programming

Bayesian and Markov networks ◮ Bayesian and Markov networks are not equivalent ◮ Chordal Markov networks are the intersection between the two Kustaa Kangas Learning chordal Markov networks by dynamic programming

Chordal graphs ◮ A chord is an edge between two non-consecutive vertices in a cycle. ◮ An graph is chordal or triangulated if every cycle of at least 4 vertices has a chord . Kustaa Kangas Learning chordal Markov networks by dynamic programming

Clique tree decomposition 7 1 5 9 2 3 6 8 4 Kustaa Kangas Learning chordal Markov networks by dynamic programming

Clique tree decomposition 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Running intersection property : For all C 1 , C 2 ∈ C , every clique on the path between C 1 and C 2 contains C 1 ∩ C 2 . Kustaa Kangas Learning chordal Markov networks by dynamic programming

Clique tree decomposition 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Separator : Intersection of adjacent cliques in a clique tree. Every clique tree has the same multiset of separators. Kustaa Kangas Learning chordal Markov networks by dynamic programming

Clique tree decomposition 7 1 2 1 7 5 1 2 3 3 5 9 6 9 2 3 2 4 8 6 8 4 8 Theorem: A graph is chordal if and only if it has a clique tree. Kustaa Kangas Learning chordal Markov networks by dynamic programming

Chordal Markov networks 7 1 5 9 2 3 6 8 4 ◮ ψ i ( X C i ) = p ( C i ) / p ( S i ) ◮ Factorization becomes � C ∈ C p ( X C ) � p ( X 1 , . . . , X n ) = ψ C ( X C ) = S ∈ S p ( X S ) , � C ∈ C where C and S are the sets of cliques and separators. Kustaa Kangas Learning chordal Markov networks by dynamic programming

Structure learning Given sampled data D from p ( X 1 , . . . X n ), how well does a graph structure G fit the data? Common scoring criteria decompose as � C ∈ C score ( C ) score ( G ) = � S ∈ S score ( S ) Each score ( C ) is the probability of the data projected to C , possibly extended with a prior or penalization term. e.g. maximum likelihood, Bayesian Dirichlet, ... Kustaa Kangas Learning chordal Markov networks by dynamic programming

Structure learning Structure learning problem in chordal Markov networks: Given score ( C ) for each C ⊆ V , find a chordal graph G that maximizes � C ∈ C score ( C ) score ( G ) = S ∈ S score ( S ) . � We assume each score ( C ) can be efficiently computed and focus on the combinatorial problem. Kustaa Kangas Learning chordal Markov networks by dynamic programming

Structure learning Bruteforce solution: ◮ Enumerate undirected graphs ◮ Determine which are chordal ◮ For each chordal G , find a clique tree to evaluate score ( G ) ◮ O ∗ (2( n 2 )) Kustaa Kangas Learning chordal Markov networks by dynamic programming

Structure learning We denote score ( T ) = score ( G ) when T is a clique tree of G . ◮ Every clique tree T uniquely specifies a chordal graph G . ◮ We can search the space of clique trees instead. Kustaa Kangas Learning chordal Markov networks by dynamic programming

Recursive characterization 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Let T be rooted at C with subtrees T 1 , . . . , T k rooted at C 1 , . . . , C k . Then, k score ( T i ) � score ( T ) = score ( C ) score ( C ∩ C i ) i =1 Kustaa Kangas Learning chordal Markov networks by dynamic programming

Recurrence For S ⊂ V and ∅ ⊂ R ⊆ V \ S , let f ( S , R ) be the maximum score ( G ) over chordal G on S ∪ R such that S is a proper subset of a clique. Then, the solution is given by f ( ∅ , V ). Kustaa Kangas Learning chordal Markov networks by dynamic programming

Recurrence For S ⊂ V and ∅ ⊂ R ⊆ V \ S , let f ( S , R ) be the maximum score ( G ) over chordal G on S ∪ R such that S is a proper subset of a clique. Then, the solution is given by f ( ∅ , V ). k f ( S i , R i ) � f ( S , R ) = max score ( C ) score ( S i ) . S ⊂ C ⊆ S ∪ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C Kustaa Kangas Learning chordal Markov networks by dynamic programming

Recurrence k score ( T i ) � score ( T ) = score ( C ) score ( C ∩ C i ) i =1 k f ( S i , R i ) � f ( S , R ) = max score ( C ) score ( S i ) S ⊂ C ⊆ S ∪ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C R R R 1 R 3 C C S C S R 2 Kustaa Kangas Learning chordal Markov networks by dynamic programming

Recurrence k score ( T i ) � score ( T ) = score ( C ) score ( C ∩ C i ) i =1 k f ( S i ∪ R i ) � f ( R ) = max score ( C ) score ( S i ) ∅ ⊂ C ⊆ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C R R R 1 R 3 C C S C S R 2 Kustaa Kangas Learning chordal Markov networks by dynamic programming

Recurrence k f ( S i , R i ) � f ( S , R ) = max score ( C ) score ( S i ) S ⊂ C ⊆ S ∪ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C Kustaa Kangas Learning chordal Markov networks by dynamic programming

Recurrence k f ( S i , R i ) � f ( S , R ) = max score ( C ) max score ( S i ) S ⊂ C ⊆ S ∪ R S i ⊂ C i =1 { R 1 , . . . , R k } ❁ R \ C Kustaa Kangas Learning chordal Markov networks by dynamic programming

Learning chordal Markov networks by dynamic programming Kustaa - PowerPoint PPT Presentation

Learning chordal Markov networks by dynamic programming Kustaa Kangas Teppo Niinim aki Mikko Koivisto NIPS 2014 (to appear) November 27, 2014 Kustaa Kangas Learning chordal Markov networks by dynamic programming Probabilistic graphical

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

A Compact Representation for Chordal Chordal Graphs Graphs A Compact Representation for Lilian

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Precoloring extension on chordal graphs D aniel Marx Budapest University of Technology and

Chordal deletion is fixed-parameter tractable D aniel Marx Humboldt-Universit at zu Berlin

On the Bi-Enhancement of Chordal-Bipartite Probe Graphs Elad Cohen Martin Charles Golumbic

Outline Markov networks (a.k.a. Markov random fields) Markov Networks Reading: Michael

Markov Logic Networks Matt Richardson and Pedro Domingos (2006), Markov Logic Networks, Machine

Markov Logic Networks Matt Richardson and Pedro Domingos (2006), Markov Logic Networks, Machine

Markov Models Yanbing Xue Outline Introduction Markov chains Dynamic belief networks

Markov Logic Networks November 18, 2008 CS 486/686 University of Waterloo Outline Markov

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Belief Propagation for Spatial Network Embeddings Andrew Frank Alex Ihler Padhraic Smyth

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Lecture 10 Undirected Models CS/CNS/EE 155 Andreas Krause

Undirected Graphical Models Undirected Graphs Chris Williams, School of Informatics, University

Advanced Machine Learning Introduction to Probabilistic Graphical Models Amit Sethi Electrical

Overview Filtering images MAP, Tikhonov and Poisson model of the noise A-priori and Markov

Permutation Editing and Matching via Embeddings Graham Cormode, S. Muthukrishnan, Cenk Sahinalp

HOST Physical Unclonable Functions I ECE 525 Introduction We discussed the basic tenets of