graphical models part ii
play

Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML - PowerPoint PPT Presentation

Markov Random Fields Inference Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML Ch. 8 Markov Random Fields Inference Outline Markov Random Fields Inference Markov Random Fields Inference Outline Markov Random Fields


  1. Markov Random Fields Inference Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML Ch. 8

  2. Markov Random Fields Inference Outline Markov Random Fields Inference

  3. Markov Random Fields Inference Outline Markov Random Fields Inference

  4. Markov Random Fields Inference Conditional Independence in Graphs a b a b c c • Recall that for Bayesian Networks, conditional independence was a bit complicated • d-separation with head-to-head links • We would like to construct a graphical representation such that conditional independence is straight-forward path checking

  5. Markov Random Fields Inference Markov Random Fields C B A • Markov random fields (MRFs) contain one node per variable • Undirected graph over these nodes • Conditional independence will be given by simple separation, blockage by observing a node on a path • e.g. in above graph, A ⊥ ⊥ B | C

  6. Markov Random Fields Inference Markov Blanket Markov • With this simple check for conditional independence, Markov blanket is also simple • Recall Markov blanket MB of x i is set of nodes such that x i conditionally independent from rest of graph given MB • Markov blanket is neighbours

  7. Markov Random Fields Inference MRF Factorization • Remember that graphical models define a factorization of the joint distribution • What should be the factorization so that we end up with the simple conditional independence check? • For x i and x j not connected by an edge in graph: x i ⊥ ⊥ x j | x \{ i , j } • So there should not be any factor ψ ( x i , x j ) in the factorized form of the joint

  8. Markov Random Fields Inference Cliques • A clique in a graph is a subset of nodes such x 1 that there is a link between every pair of x 2 nodes in the subset • A maximal clique is a clique for which one x 3 cannot add another node and have the set x 4 remain a clique

  9. Markov Random Fields Inference MRF Joint Distribution • Note that nodes in a clique cannot be made conditionally independent from each other • So defining factors ψ ( · ) on nodes in a clique is “safe” • The joint distribution for a Markov random field is: p ( x 1 , . . . , x K ) = 1 � ψ C ( x C ) Z C where x C is the set of nodes in clique C , and the product runs over all maximal cliques • Each ψ C ( x C ) ≥ 0 • Z is a normalization constant

  10. Markov Random Fields Inference MRF Joint - Terminology • The joint distribution for a Markov random field is: p ( x 1 , . . . , x K ) = 1 � ψ C ( x C ) Z C • Each ψ C ( x C ) ≥ 0 is called a potential function • Z , the normalization constant, is called the partition function: � � Z = ψ C ( x C ) C x • Z is very costly to compute, since it is a sum/integral over all possible states for all variables in x • Don’t always need to evaluate it though, will cancel for computing conditional probabilities

  11. Markov Random Fields Inference MRF Joint Distribution Example • The joint distribution for a Markov random field is: 1 � p ( x 1 , . . . , x 4 ) = ψ C ( x C ) x 1 Z x 2 C 1 = Z ψ 123 ( x 1 , x 2 , x 3 ) ψ 234 ( x 2 , x 3 , x 4 ) x 3 x 4 • Note that maximal cliques subsume smaller ones: ψ 123 ( x 1 , x 2 , x 3 ) could include ψ 12 ( x 1 , x 2 ) , though sometimes smaller cliques are explicitly used for clarity

  12. Markov Random Fields Inference Hammersley-Clifford • The definition of the joint: p ( x 1 , . . . , x K ) = 1 � ψ C ( x C ) Z C • Note that we started with particular conditional independences • We then formulated the factorization based on clique potentials • This formulation resulted in the right conditional independences • The converse is true as well, any strictly positive distribution with the conditional independences given by the undirected graph can be represented using a product of clique potentials • This is the Hammersley-Clifford theorem

  13. Markov Random Fields Inference Energy Functions • Often use exponential, which is non-negative, to define potential functions: ψ C ( x C ) = exp {− E C ( x C ) } • Minus sign − by convention • E C ( x C ) is called an energy function • From physics, low energy = high probability • This exponential representation is known as the Boltzmann distribution

  14. Markov Random Fields Inference Energy Functions - Intuition • Joint distribution nicely rearranges as 1 � p ( x 1 , . . . , x K ) = ψ C ( x C ) Z C 1 � = Z exp {− E C ( x C ) } C • Intuition about potential functions: ψ C are describing good (low energy) sets of states for adjacent nodes • An example of this is next

  15. Markov Random Fields Inference Image Denoising • Consider the problem of trying to correct (denoise) an image that has been corrupted • Assume image is binary • Observed (noisy) pixel values y i ∈ {− 1 , + 1 } • Unobserved true pixel values x i ∈ {− 1 , + 1 } • Another application: face sketch synthesis from photos http: //people.csail.mit.edu/xgwang/sketch.html .

  16. Markov Random Fields Inference Image Denoising - Graphical Model y i x i • Cliques containing each true pixel value x i ∈ {− 1 , + 1 } and observed value y i ∈ {− 1 , + 1 } • Observed pixel value is usually same as true pixel value • Energy function − η x i y i , η > 0 , lower energy (better) if x i = y i • Cliques containing adjacent true pixel values x i , x j • Nearby pixel values are usually the same • Energy function − β x i x j , β > 0 , lower energy (better) if x i = x j

  17. Markov Random Fields Inference Image Denoising - Graphical Model y i x i • Complete energy function: � � E ( x , y ) = − β x i x j − η x i y i { i , j } i • Joint distribution: p ( x , y ) = 1 Z exp {− E ( x , y ) } • Or, as potential functions ψ n ( x i , x j ) = exp ( β x i x j ) , ψ p ( x i , y i ) = exp ( η x i y i ) : p ( x , y ) = 1 � � ψ n ( x i , x j ) ψ p ( x i , y i ) Z i , j i

  18. Markov Random Fields Inference Image Denoising - Inference • The denoising query is arg max x p ( x | y ) • Two approaches: • Iterated conditional modes (ICM): hill climbing in x , one variable x i at a time • Simple to compute, Markov blanket is just observation plus neighbouring pixels • Graph cuts: formulate as max-flow/min-cut problem, exact inference (for this graph)

  19. Markov Random Fields Inference Converting Directed Graphs into Undirected Graphs x 1 x 2 x N − 1 x N x N − 1 x 1 x 2 x N • Consider a simple directed chain graph: p ( x ) = p ( x 1 ) p ( x 2 | x 1 ) p ( x 3 | x 2 ) . . . p ( x N | x N − 1 ) • Can convert to undirected graph p ( x ) = 1 Z ψ 1 , 2 ( x 1 , x 2 ) ψ 2 , 3 ( x 2 , x 3 ) . . . ψ N − 1 , N ( x N − 1 , x N ) where ψ 1 , 2 = p ( x 1 ) p ( x 2 | x 1 ) , all other ψ k − 1 , k = p ( x k | x k − 1 ) , Z = 1

  20. Markov Random Fields Inference Converting Directed Graphs into Undirected Graphs • The chain was straight-forward because for each conditional p ( x i | pa i ) , nodes x i ∪ pa i were contained in one clique • Hence we could define that clique potential to include that conditional • For a general undirected graph we can force this to occur by “marrying” the parents • Add links between all parents in pa i • This process known as moralization, creating a moral graph

  21. Markov Random Fields Inference Strong Morals x 1 x 3 x 1 x 3 x 2 x 2 x 4 x 4 • Start with directed graph on left • Add undirected edges between all parents of each node • Remove directionality from original edges

  22. Markov Random Fields Inference Constructing Potential Functions x 1 x 3 x 1 x 3 x 2 x 2 x 4 x 4 • Initialize all potential functions to be 1 • With moral graph, for each p ( x i | pa i ) , there is at least one clique which contains all of x i ∪ pa i • Multiply p ( x i | pa i ) into potential function for one of these cliques • Z = 1 again since: � � p ( x ) = ψ C ( x C ) = p ( x i | pa i ) C i which is already normalized

  23. Markov Random Fields Inference Equivalence Between Graph Types • Note that the moralized undirected graph loses some of the conditional independence statements of the directed graph • Further, there are certain conditional independence assumptions which can be represented by directed graphs which cannot be represented by directed graphs, and vice versa • Directed graph: A ⊥ ⊥ B |∅ , A ⊤ ⊤ B | C , cannot be represented using undirected graph • Undirected graph: A ⊤ ⊤ B |∅ , A ⊥ ⊥ B | C ∪ D , C ⊥ ⊥ D | A ∪ B cannot be represented using directed graph

  24. Markov Random Fields Inference Equivalence Between Graph Types A B C • Note that the moralized undirected graph loses some of the conditional independence statements of the directed graph • Further, there are certain conditional independence assumptions which can be represented by directed graphs which cannot be represented by directed graphs, and vice versa • Directed graph: A ⊥ ⊥ B |∅ , A ⊤ ⊤ B | C , cannot be represented using undirected graph • Undirected graph: A ⊤ ⊤ B |∅ , A ⊥ ⊥ B | C ∪ D , C ⊥ ⊥ D | A ∪ B cannot be represented using directed graph

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend