 
              Cmput 651 ‐ Undirected Models 1 17/10/08 Cmput 651 – Probabilistic Graphical Models Probabilistic Graphical Models (Cmput 651): Undirected Graphical Models 1 Matthew Brown 17/10/2008 1 Space of Topics Semantics Inference Learning Directed UnDirected Discrete Continuous 2
Cmput 651 ‐ Undirected Models 1 17/10/08 What is an undirected model (or Markov net)? Some examples: Graph structure has undirected edges (understandably). 3 Why use undirected models? (Misconception Example) A A A D B D B D B C C C This works: NO! This implies: NO! This implies: A ⊥ C | B , D ¬ ( A ⊥ C | B , D ) A ⊥ C | B , D B ⊥ D | A , C ¬ ( B ⊥ D | A , C ) B ⊥ D | A , C Bayesian networks cannot represent some distributions • Misconception example from Koller‐Friedman (Fig 3.16) 4
Cmput 651 ‐ Undirected Models 1 17/10/08 Why use undirected models? (Lab Dynamics Example) Two‐way Top‐down Professor A communication A communication PhD Students B C B C MSc Students D E D E Undergrad F F Some distributions can be represented by both Bayes and Markov nets. Sometimes, the Markov net is more natural (namely, when there is no obvious directionality). (We’ll come back to Bayes vs. Markov nets later.) 5 Outline • What are Markov networks • Relating undirected graphs and PDFs • Beyond Markov networks • Bringing it all together: Tumour segmentation eg • Markov nets vs. Bayes nets 6
Cmput 651 ‐ Undirected Models 1 17/10/08 Parameterization: Recall Bayes net CPDs a P(B=1|A=a) A 0 0.6 1 0.8 D B b, d P(C=1|B=b,D=d) 0, 0 0.1 C 0, 1 0.9 1, 0 0.5 1, 1 0.5 Recall: In Bayes nets, conditional probability distributions (CPDs) describe the relationship between nodes joined by a (directed) edge. 7 Parameterization: Factors A Factors describe weighting between connected nodes, namely factors. Factor values always ≥ 0 D B Not necessarily normalized • PDFs, CPDs are special cases C KF Fig 4.1 8
Cmput 651 ‐ Undirected Models 1 17/10/08 Parameterization: Factors and Factorization Probability distribution derived by A multiplying the factors and then normalizing them. D B ) = 1 ( [ ] ⋅ φ 2 b , c [ ] ⋅ φ 3 c , d [ ] ⋅ φ 4 a , d [ ] P a , b , c , d Z φ 1 a , b C ∑ [ ] ⋅ φ 2 b , c [ ] ⋅ φ 3 c , d [ ] ⋅ φ 4 a , d [ ] Z = φ 1 a , b a , b , c , d Any probability distribution that can be expressed as a normalized product of factors in this way is called a Gibbs distribution . (We’ll come back to this below, also see KF Definition 4.3.4.) 9 Parameterization: Example (slide 1/2) A What happens when you multiply over all the factors below? (answer on next slide) D B C KF Fig 4.1 10
Cmput 651 ‐ Undirected Models 1 17/10/08 Parameterization: Example Product over factors A D B C 11 Parameterization: Factor products [ ] = φ 1 A , B [ ] ⋅ φ 2 B , C [ ] ϕ A , B , C Match up the shared variable assignments (B in this example). [ ] φ 1 A , B [ ] φ 2 B , C KF Fig 4.3 (also see Join from RG's slides: Variable elimination – slide 25) 12
Cmput 651 ‐ Undirected Models 1 17/10/08 Parameterization: General thoughts General factors and Markov nets: • Advantage: not normalized • Computations easier, don’t have to normalize until end • Disadvantage: not normalized • Harder to intuit how changes to a factor affect whole PDF • Harder to train 13 Factorization of PDFs: Gibbs distributions A PDF P(X 1 ,...,X n ) is a Gibbs distribution if it factorizes thus: ) = 1 P X 1 , … , X n ] ⋅ … ⋅ φ m D m ( Z φ 1 D [ ] ⋅ φ 2 D 2 [ [ ] 1 ∑ ] ⋅ … ⋅ φ m D m Z = φ 1 D [ ] ⋅ φ 2 D 2 [ [ ] 1 X 1 , … , X n where D 1 , D 2 , etc. are (possibly overlapping) subsets of X 1 ,...,X n D i is called the scope of factor ϕ i Z is the partition function and normalizes the factor product in the numerator. (also see KF Definition 4.3.4) 14
Cmput 651 ‐ Undirected Models 1 17/10/08 Factorization of PDFs A PDF P(X 1 ,...,X n ) factorizes over a Markov net H, if 1. P(X 1 ,...,X n ) is a Gibbs distribution: ) = 1 P X 1 , … , X n ] ⋅ … ⋅ φ m D m ( Z φ 1 D [ ] ⋅ φ 2 D 2 [ [ ] 1 ∑ ] ⋅ … ⋅ φ m D m Z = φ 1 D [ ] ⋅ φ 2 D 2 [ [ ] 1 and X 1 , … , X n 2. D 1 , D 2 , etc. are (maximal or non‐maximal) cliques of H Recall: clique = a complete (fully‐connected) subgraph of H maximal clique = clique that is not a subgraph of a larger clique (K&F use the terms “clique” and “subclique” for what Russ and I (and the graphical modeling community) call “maximal clique” and clique”.) 15 Cliques Maximal cliques: {A,E} {B,C,D,E} C {D,E,F} {D,F,G} G D B Examples of Cliques: {A}, {B}, {C}, etc. (i.e. single nodes) {B,C}, {B,D}, {B,E}, {E,D}, {F,G}, etc. E A F {B,C,D}, {C,D,E}, etc. {D,E,F,G} is NOT a clique (no E‐G edge) 16
Cmput 651 ‐ Undirected Models 1 17/10/08 Factorization of PDFs: example ) = 1 ( [ ] ⋅ φ 2 B , C , D [ ] ⋅ φ 3 B , D , E [ ] ⋅ P A , B , C , D , E Z φ 1 A , B C [ ] ⋅ φ 5 B , D [ ] ⋅ φ 6 C , D [ ] ⋅ φ 7 B , E [ ] ⋅ φ 8 D , E [ ] ⋅ φ 4 B , C [ ] ⋅ φ 10 B [ ] ⋅ φ 11 C [ ] ⋅ φ 12 D [ ] ⋅ φ 13 E [ ] φ 9 A D B NOTE: There is more than one way to E define the factors. For example, one can A use only the maximal cliques (because the maximal clique factors can subsume the (sub)clique factors): ) = 1 ( [ ] ⋅ ψ 2 B , C , D [ ] ⋅ ψ 3 B , D , E [ ] P A , B , C , D , E Z ψ 1 A , B 17 Independence: global Markov assumption Active path : path from X to Y with no conditioned nodes on it. C Global Markov assumption : if there is no active path from X to Y G D B after conditioning on some set of nodes Z, then X and Y are independent given Z. E A F ( A ⊥ B | E ) ( A ⊥ { B , C , D , F , G } | E ) Fill denotes ¬ ( B ⊥ G | E ) conditioning Also see KF section 4.3.1 18
Cmput 651 ‐ Undirected Models 1 17/10/08 Independence: global Markov assumption More examples: C ({ B , C } ⊥ { F , G } | D , E ) G D B ( A ⊥ { B , C , F , G } | D , E ) E A F Fill denotes conditioning 19 Independence: global Markov assumption With no conditioning, no C independencies among any nodes. eg: G D B ¬ ( A ⊥ { B , C , D , E , F , G } |{}) E A F 20
Cmput 651 ‐ Undirected Models 1 17/10/08 Independence: global Markov assumption K I C H J G D B Without conditioning, only non‐ E connected graphs have A F independencies. ({ H , I , J , K } ⊥ { A , B , C , D , E , F , G } |{}) 21 Global Markov independence is monotonic C C H H G G B B D D I I F F E E A J A J After conditioning on F and G, Adding more nodes to the {A,B} is independent of {H,I,J}. conditioned set does NOT change this. Monotonicity: Adding more nodes to the conditioned set (cut set) does not change independence relations in Markov nets. 22
Recommend
More recommend