Markov Networks March 2, 2010 CS 886 University of Waterloo

Outline • Markov networks (a.k.a. Markov random fields) • Reading: Michael Jordan, Graphical Models , Statistical Science (Special Issue on Bayesian Statistics), 19, 140- 155, 2004. 2 CS886 Lecture Slides (c) 2010 P. Poupart

Recall Bayesian networks • Directed acyclic graph Cloudy • Arcs often interpreted as causal relationships Sprinkler Rain • Joint distribution: product of conditional dist Wet grass 3 CS886 Lecture Slides (c) 2010 P. Poupart

Markov networks • Undirected graph Cloudy • Arcs simply indicate direct correlations Sprinkler Rain • Joint distribution: normalized product of potentials Wet grass • Popular in computer vision and natural language processing 4 CS886 Lecture Slides (c) 2010 P. Poupart

Parameterization • Joint: normalized product of potentials Pr( X ) = 1/k Π j f j ( CLIQUE j ) Cloudy = 1/k f 1 (C,S,R) f 2 (S,R,W) where k is a normalization constant k = Σ X i Π j f j ( CLIQUE j ) Sprinkler Rain = Σ C,S,R,W f 1 (C,S,R) f 2 (S,R,W) • Potential: Wet – Non-negative factor grass – Potential for each maximal clique in the graph – Entries: “likelihood strength” of different configurations. 5 CS886 Lecture Slides (c) 2010 P. Poupart

Markov property • Markov property: a variable is independent of all other variables given its immediate neighbours. • Markov blanket: set of direct neighbours B MB(A) = {B,C,D,E} E A C D 7 CS886 Lecture Slides (c) 2010 P. Poupart

Conditional Independence • X and Y are independent given Z iff there doesn’t exist any path between X and Y that doesn’t contain any of the variables in Z • Exercise: – A,E? E – A,E|D? A F – A,E|C? G B D – A,E|B,C? C H 8 CS886 Lecture Slides (c) 2010 P. Poupart

Interpretation • Markov property has a price: – Numbers are not probabilities • What are potentials? – They are indicative of local correlations • What do the numbers mean? – They are indicative of the likelihood of each configuration – Numbers are usually learnt from data since it is hard to specify them by hand given their lack of a clear interpretation 9 CS886 Lecture Slides (c) 2010 P. Poupart

Applications • Natural language processing: – Part of speech tagging • Computer vision – Image segmentation • Any other application where there is no clear causal relationship 10 CS886 Lecture Slides (c) 2010 P. Poupart

Image Segmentation Segmentation of the Alps Kervrann, Heitz (1995) A Markov Random Field model-based Approach to Unsupervised Texture Segmentation Using Local and Global Spatial Statistics, IEEE Transactions on Image Processing, vol 4, no 6, p 856-862 11 CS886 Lecture Slides (c) 2010 P. Poupart

Image Segmentation • Variables – Pixel features (e.g. intensities): X ij – Pixel labels: Y ij • Correlations: – Neighbouring pixel labels are correlated – Label and features of a pixel are correlated • Segmentation: – argmax Y Pr( Y | X )? 12 CS886 Lecture Slides (c) 2010 P. Poupart

Inference • Markov nets: factored representation – Use variable elimination • P( X | E = e )? – Restrict all factors that contain E to e – Sumout all variables that are not X or in E – Normalize the answer 13 CS886 Lecture Slides (c) 2010 P. Poupart

Parameter Learning • Maximum likelihood – θ * = argmax θ P(data| θ ) • Complete data – Convex optimization, but no closed form solution – Iterative techniques such as gradient descent • Incomplete data – Non-convex optimization – EM algorithm 14 CS886 Lecture Slides (c) 2010 P. Poupart

Maximum likelihood • Let θ be the set of parameters and x i be the i th instance in the dataset • Optimization problem: – θ * = argmax θ P(data| θ ) = argmax θ Π i Pr( x i | θ ) = argmax θ Π i Π j f( X [j]= x i [j]) Σ X Π j f( X [j]= x i [j]) where X [j] is the clique of variables that potential j depends on and x [j] is a variable assignment for that clique 15 CS886 Lecture Slides (c) 2010 P. Poupart

Maximum likelihood • Let θ x = f( X = x ) • Optimization continued: – θ * = argmax θ Π i Π j θ X i [j] Σ X Π j θ X i [j] = argmax θ log Π i Π j θ X i [j] Σ X Π j θ X i [j] = argmax θ Σ i Σ j log θ X i [j] – log Σ X Π j θ X i [j] • This is a non-concave optimization problem 16 CS886 Lecture Slides (c) 2010 P. Poupart

Maximum likelihood • Substitute λ = log θ and the problem becomes concave : – λ * = argmax λ Σ i Σ j λ X i [j] – log Σ X e Σ j λ X i [j] • Possible algorithms: – Gradient ascent – Conjugate gradient 17 CS886 Lecture Slides (c) 2010 P. Poupart

Feature-based Markov Networks • Generalization of Markov networks – May not have a corresponding graph – Use features and weights instead of potentials – Use exponential representation • Pr( X = x ) = 1/k e Σ j λ j φ j ( x[j] ) where x[j] is a variable assignment for a subset of variables specific to φ j • Feature φ j : Boolean function that maps partial variable assignments to 0 or 1 • Weight λ j : real number 18 CS886 Lecture Slides (c) 2010 P. Poupart

Feature-based Markov Networks • Potential-based Markov networks can always be converted to feature-based Markov networks Pr( x ) = 1/k Π j f j ( CLIQUE j = x [j]) = 1/k e Σ j, clique j λ j, clique j φ j, clique j ( x [j]) • λ j, clique j = log f j ( CLIQUE j = x [j]) • φ j, clique j ( x [j])=1 if clique j = x [j], 0 otherwise 19 CS886 Lecture Slides (c) 2010 P. Poupart

Example weights features f 1 (C,S,R) 1 if CSR = csr λ 1,csr = log 3 φ 1,csr (CSR) = csr 3 0 otherwise cs~r 2.5 1 if CSR = *s~r λ 1,*s~r = log 2.5 φ 1,*s~r (CSR) = 0 otherwise c~sr 5 1 if CSR = c~sr λ 1,c~sr = log 5 φ c~sr (CSR) = c~s~r 5.5 0 otherwise λ 1,c~s~r = log 5.5 φ 1,c~s~r (CSR) = 1 if CSR = c~s~r ~csr 0 0 otherwise ~cs~r 2.5 λ 1,~c*r = log 0 φ 1,~c*r (CSR) = 1 if CSR = ~c*r ~c~sr 0 0 otherwise λ 1,~c~s~r = log 7 φ ~c~s~r (CSR) = 1 if CSR = ~c~s~r ~c~s~r 7 0 otherwise 20 CS886 Lecture Slides (c) 2010 P. Poupart

Features • Features – Any Boolean function – Provide tremendous flexibility • Example: text categorization – Simplest features: presence/absence of a word in a document – More complex features • Presence/absence of specific expressions • Presence/absence of two words within a certain window • Presence/absence of any combination of words • Presence/absence of a figure of style • Presence/absence of any linguistic feature 21 CS886 Lecture Slides (c) 2010 P. Poupart

Markov Networks March 2, 2010 CS 886 University of Waterloo - PowerPoint PPT Presentation

Markov Networks March 2, 2010 CS 886 University of Waterloo Outline Markov networks (a.k.a. Markov random fields) Reading: Michael Jordan, Graphical Models , Statistical Science (Special Issue on Bayesian Statistics), 19, 140- 155,

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Outline Markov networks (a.k.a. Markov random fields) Markov Networks Reading: Michael

Markov Logic Networks Matt Richardson and Pedro Domingos (2006), Markov Logic Networks, Machine

Markov Logic Networks Matt Richardson and Pedro Domingos (2006), Markov Logic Networks, Machine

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? Markov

14: Clique Finding Machine Learning and Real-world Data Ann Copestake and Simone Teufel Computer

Graphical Models Graphical Models Clique trees & Belief Propagation Siamak Ravanbakhsh

Clusters and Communities Lecture 7 CSCI 4974/6971 22 Sep 2016 1 / 14 Todays Biz 1.

NP-completeness CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Fall 2018 Chinese

Algorithmic Challenges in Link Streams: the case of clique computations Cl emence Magnien

Finding Cliques Quickly: An Update David Eppstein (includes joint work with Darren Strash and

for Planted Clique Part I Lecture Outline Part I: Planted Clique and the Meka-Wigderson

Cliques & communities Network Analysis in Python I Cliques Social cliques:

Markov Networks March 2, 2010 CS 886 University of Waterloo - PowerPoint PPT Presentation

Markov Networks March 2, 2010 CS 886 University of Waterloo Outline Markov networks (a.k.a. Markov random fields) Reading: Michael Jordan, Graphical Models , Statistical Science (Special Issue on Bayesian Statistics), 19, 140- 155,

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Outline Markov networks (a.k.a. Markov random fields) Markov Networks Reading: Michael

Markov Logic Networks Matt Richardson and Pedro Domingos (2006), Markov Logic Networks, Machine

Markov Logic Networks Matt Richardson and Pedro Domingos (2006), Markov Logic Networks, Machine

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? Markov

14: Clique Finding Machine Learning and Real-world Data Ann Copestake and Simone Teufel Computer

Graphical Models Graphical Models Clique trees &amp; Belief Propagation Siamak Ravanbakhsh

Clusters and Communities Lecture 7 CSCI 4974/6971 22 Sep 2016 1 / 14 Todays Biz 1.

NP-completeness CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Fall 2018 Chinese

Algorithmic Challenges in Link Streams: the case of clique computations Cl emence Magnien

Finding Cliques Quickly: An Update David Eppstein (includes joint work with Darren Strash and

for Planted Clique Part I Lecture Outline Part I: Planted Clique and the Meka-Wigderson

Cliques &amp; communities Network Analysis in Python I Cliques Social cliques:

Graphical Models Graphical Models Clique trees & Belief Propagation Siamak Ravanbakhsh

Cliques & communities Network Analysis in Python I Cliques Social cliques: