ece 6504 advanced topics in machine learning
play

ECE 6504: Advanced Topics in Machine Learning Probabilistic - PowerPoint PPT Presentation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics: Bayes Nets: Representation/Semantics d-separation, Local Markov Assumption Markov Blanket I-equivalence, (Minimal)


  1. ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics: – Bayes Nets: Representation/Semantics – d-separation, Local Markov Assumption – Markov Blanket – I-equivalence, (Minimal) I-Maps, P-Maps Readings: KF 3.2,, 3.4 Dhruv Batra Virginia Tech

  2. Recap of Last Time (C) Dhruv Batra 2

  3. A general Bayes net • Set of random variables Flu Allergy Sinus • Directed acyclic graph – Encodes independence assumptions Nose Headache • CPTs – Conditional Probability Tables • Joint distribution: (C) Dhruv Batra 3

  4. Independencies in Problem World, Data, reality: BN: True distribution P contains independence assertions Graph G encodes local independence assumptions (C) Dhruv Batra Slide Credit: Carlos Guestrin 4

  5. Bayes Nets • BN encode (conditional) independence assumptions. – I(G) = {X indep of Y given Z} • Which ones? • And how can we easily read them? (C) Dhruv Batra 5

  6. Local Structures • What’s the smallest Bayes Net? (C) Dhruv Batra 6

  7. Local Structures Indirect causal effect: X Z Y Indirect evidential effect: Common effect: X Z Y X Y Common cause: Z Z X Y (C) Dhruv Batra 7

  8. Bayes Ball Rules • Flow of information – on board (C) Dhruv Batra 8

  9. Plan for today • Bayesian Networks: Semantics – d-separation – General (conditional) independence assumptions in a BN – Markov Blanket – (Minimal) I-map, P-map (C) Dhruv Batra 9

  10. Active trails formalized • Let variables O ⊆ {X 1 , … ,X n } be observed • A path X 1 – X 2 – · · · –X k is an active trail if for each consecutive triplet: – X i-1 → X i → X i+1 , and X i is not observed (X i ∉ O ) – X i-1 ← X i ← X i+1 , and X i is not observed (X i ∉ O ) – X i-1 ← X i → X i+1 , and X i is not observed (X i ∉ O ) – X i-1 → X i ← X i+1 , and X i is observed (X i ∈ O ), or one of its descendents is observed (C) Dhruv Batra Slide Credit: Carlos Guestrin 10

  11. An active trail – Example G E A B D H C F F’ F’’ When are A and H independent?

  12. d-Separation A B • Definition : Variables X and Y are d-separated given Z if C – no active trail between X i and Y j when variables Z ⊆ {X 1 , … ,X n } are E observed D G F H J I K (C) Dhruv Batra Slide Credit: Carlos Guestrin 12

  13. d-Separation • So what if X and Y are d-separated given Z ? (C) Dhruv Batra 13

  14. Factorization + d-sep è Independence • Theorem: – If • P factorizes over G • d-sep G ( X , Y | Z ) – Then • P Ⱶ ( X ⊥ Y | Z ) – Corollary: • I( G ) ⊆ I( P ) • All independence assertions read from G are correct! (C) Dhruv Batra 14

  15. More generally: Completeness of d-separation • Theorem: Completeness of d-separation – For “almost all” distributions where P factorizes over to G – we have that I( G ) = I( P ) • “almost all” distributions : except for a set of measure zero of CPTs • Means that if X & Y are not d-separated given Z , then P ¬ ( X ⊥ Y|Z ) (C) Dhruv Batra Slide Credit: Carlos Guestrin 15

  16. Local Markov Assumption A variable X is independent of Flu Allergy its non-descendants given its parents and only its parents Sinus (X i ⊥ NonDescendants Xi | Pa Xi ) Nose Headache

  17. Markov Blanket = Markov Blanket of variable x 8 ¡ – Parents, children and parents of children ¡ (C) Dhruv Batra Slide Credit: Simon J.D. Prince 17

  18. Example A variable is conditionally independent of all others, given its Markov Blanket ¡ (C) Dhruv Batra Slide Credit: Simon J.D. Prince 18

  19. I-map • Independency map • Definition: – If I( G ) ⊆ I( P ) – G is an I-map of P (C) Dhruv Batra 19

  20. Factorization + d-sep è Independence • Theorem: – If • P factorizes over G • d-sep G ( X , Y | Z ) – Then • P Ⱶ ( X ⊥ Y | Z ) – Corollary: • I( G ) ⊆ I( P ) • G is an I-map of P • All independence assertions read from G are correct! (C) Dhruv Batra 20

  21. The BN Representation Theorem P factorizes to G Obtain If G is an I-map of P Important because: Every P has at least one BN structure G Homework 1!!!! J J P factorizes to G Obtain G is an I-map of P Important because: Read independencies of P from BN structure G (C) Dhruv Batra Slide Credit: Carlos Guestrin 21

  22. I-Equivalence • Two graphs G 1 and G 2 are I-equivalent if – I( G 1 ) = I( G 2 ) • Equivalence class of BN structures – Mutually-exclusive and exhaustive partition of graphs (C) Dhruv Batra 22

  23. Minimal I-maps & P-maps • Many possible I-maps • Is there a “simplest” I-map? • Yes, two directions – Minimal I-maps – P-maps (C) Dhruv Batra 23

  24. Minimal I-map • G is a minimal I-map for P if – deleting any edges from G makes it no longer an I-map (C) Dhruv Batra 24

  25. P-map • Perfect map • G is a P-map for P if – I( P ) = I( G ) • Question: Does every distribution P have P-map? (C) Dhruv Batra 25

  26. BN: Representation: What you need to know • Bayesian networks – A compact representation for large probability distributions – Not an algorithm • Representation – BNs represent (conditional) independence assumptions – BN structure = family of distributions – BN structure + CPTs = 1 single distribution – Concepts • Active Trails (flow of information); d-separation; • Local Markov Assumptions, Markov Blanket • I-map, P-map • BN Representation Theorem (I-map çè Factorization) (C) Dhruv Batra 26

  27. Main Issues in PGMs • Representation – How do we store P(X 1 , X 2 , … , X n ) – What does my model mean/imply/assume? (Semantics) • Learning – How do we learn parameters and structure of P(X 1 , X 2 , … , X n ) from data? – What model is the right for my data? • Inference – How do I answer questions/queries with my model? such as – Marginal Estimation: P(X 5 | X 1 , X 4 ) – Most Probable Explanation: argmax P(X 1 , X 2 , … , X n ) (C) Dhruv Batra 27

  28. Learning Bayes nets Known structure Unknown structure Fully observable Very easy Hard data Missing data Somewhat easy Very very hard (EM) Data CPTs – x (1) P(X i | Pa Xi ) … x (m) structure parameters (C) Dhruv Batra Slide Credit: Carlos Guestrin 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend