ece 6504 advanced topics in machine learning
play

ECE 6504: Advanced Topics in Machine Learning Probabilistic - PowerPoint PPT Presentation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets (Finish) Structure Learning Readings: KF 18.4; Barber 9.5, 10.4 Dhruv Batra Virginia Tech Administrativia HW1


  1. ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics – Bayes Nets – (Finish) Structure Learning Readings: KF 18.4; Barber 9.5, 10.4 Dhruv Batra Virginia Tech

  2. Administrativia • HW1 – Out – Due in 2 weeks: Feb 17, Feb 19, 11:59pm – Please please please please start early – Implementation: TAN, structure + parameter learning – Please post questions on Scholar Forum. (C) Dhruv Batra 2

  3. Recap of Last Time (C) Dhruv Batra 3

  4. Learning Bayes nets Known structure Unknown structure Fully observable Very easy Hard data Missing data Somewhat easy Very very hard (EM) Data CPTs – x (1) P(X i | Pa Xi ) … x (m) structure parameters (C) Dhruv Batra Slide Credit: Carlos Guestrin 4

  5. Types of Errors • Truth: Flu Allergy Sinus Nose Headache • Recovered: Flu Flu Allergy Allergy Sinus Sinus Nose Nose Headache Headache (C) Dhruv Batra 5

  6. Score-based approach Possible structures Data Flu Allergy Score structure Learn parameters Sinus -52 Nose Headache <x 1 (1) , … ,x n (1) > … <x 1 (m) , … ,x n (m) > Flu Allergy Score structure Learn parameters Sinus -60 Nose Headache Flu Allergy Score structure Learn parameters Sinus -500 Nose Headache (C) Dhruv Batra Slide Credit: Carlos Guestrin 6

  7. How many graphs? • N vertices. • How many (undirected) graphs? • How many (undirected) trees? (C) Dhruv Batra 7

  8. What’s a good score? • Score(G) = log-likelihood(G : D, θ MLE ) = logP(D | θ MLE , G) (C) Dhruv Batra 8

  9. Information-theoretic interpretation of Maximum Likelihood Score Flu Allergy Sinus Nose Headache • Implications: – Intuitive: higher mutual info à higher score – Decomposes over families in BN (node and it’s parents) – Same score for I-equivalent structures! (C) Dhruv Batra 9

  10. Log-Likelihood Score Overfits Flu Allergy Sinus Nose Headache • Adding an edge only improves score! – Thus, MLE = complete graph • Two fixes: – Restrict space of graphs • say only d parents allowed (d=1 à trees) – Put priors on graphs • Prefer sparser graphs (C) Dhruv Batra 10

  11. Chow-Liu tree learning algorithm 1 • For each pair of variables X i ,X j – Compute empirical distribution: – Compute mutual information: • Define a graph – Nodes X 1 , … ,X n – Edge (i,j) gets weight (C) Dhruv Batra Slide Credit: Carlos Guestrin 11

  12. Chow-Liu tree learning algorithm 2 • Optimal tree BN – Compute maximum weight spanning tree – Directions in BN: pick any node as root, and direct edges away from root • breadth-first-search defines directions (C) Dhruv Batra Slide Credit: Carlos Guestrin 12

  13. Can we extend Chow-Liu? • Tree augmented naïve Bayes (TAN) [Friedman et al. ’ 97] – Naïve Bayes model overcounts, because correlation between features not considered – Same as Chow-Liu, but score edges with: (C) Dhruv Batra Slide Credit: Carlos Guestrin 13

  14. Plan for today • (Finish) BN Structure Learning – Bayesian Score – Heuristic Search – Efficient tricks with decomposable scores (C) Dhruv Batra 14

  15. Bayesian score • Bayesian view à Prior distributions: – Over structures – Over parameters of a structure • Posterior over structures given data: (C) Dhruv Batra 15

  16. Structure Prior • Common choices: – Uniform: P( G ) α c – Sparsity prior: P( G ) α c | G | – Prior penalizing number of parameters – P(G) should decompose like the family score (C) Dhruv Batra 16

  17. Parameter Prior and Integrals • Important Result: – If P( θ G | G) is Dirichlet, then integral has closed form! – And it factorizes according to families in G P ( D | G ) Dirichlet marginal likelihood ∏∏ = for multinomial P(X i | pa i ) pa G i i ( ) ( pa G ) ( ( x , pa G ) N ( x , pa G )) Γ α Γ α + i i i ) ∏ i i ( ( pa ) N ( pa ) ( ( x , pa )) G G G Γ α + Γ α x i i i i i (C) Dhruv Batra 17

  18. Parameter Prior and Integrals • How should we choose Dirichlet hyperparameters? – K2 prior : fix an α , P( θ Xi| Pa Xi ) = Dirichlet( α , … , α ) • K2 is “inconsistent” (C) Dhruv Batra 18

  19. BDe Prior • BDe Prior – Remember that Dirichlet parameters are analogous to “ fictitious samples ” – Pick a fictitious sample size m ’ – Pick a “prior” BN • Usually independent (product of marginals) – Compute P(X i , Pa Xi ) under this prior BN • BDe prior : • Has consistency property (C) Dhruv Batra 19

  20. Chow-Liu for Bayesian score • Edge weight w Xj à Xi is advantage of adding X j as parent for X i • Now have a directed graph, need directed spanning forest – Note that adding an edge can hurt Bayesian score – choose forest not tree – Maximum spanning forest algorithm works (C) Dhruv Batra 20

  21. Structure learning for general graphs • In a tree, a node only has one parent • Theorem : – The problem of learning a BN structure with at most d parents is NP-hard for any (fixed) d ≥ 2 • Most structure learning approaches use heuristics – Exploit score decomposition – (Quickly) Describe two heuristics that exploit decomposition in different ways (C) Dhruv Batra 21

  22. Structure learning using local search Starting from Local search, Select using Chow-Liu tree possible moves: favorite score Only if acyclic!!! • Add edge • Delete edge • Invert edge (C) Dhruv Batra 22

  23. Structure learning using local search • Problems: – Local maximum – Plateau • Strategies – Random restart – Tabu list (C) Dhruv Batra 23

  24. Exploit score decomposition in local search Flu Allergy • Add edge and delete edge: Sinus – Only rescore one family! Nose Headache Local Move Add Edge • Reverse edge – Rescore only two families Flu Allergy Sinus Nose Headache (C) Dhruv Batra 24

  25. Example Alarm network (C) Dhruv Batra 25

  26. Example • JamBayes [Horvitz et al UAI05] (C) Dhruv Batra 26

  27. Example • JamBayes [Horvitz et al UAI05] (C) Dhruv Batra 27

  28. Example • JamBayes [Horvitz et al] (C) Dhruv Batra 28

  29. Bayesian model averaging • So far, we have selected a single structure • But, if you are really Bayesian, must average over structures – Similar to averaging over parameters

  30. BN: Structure Learning: What you need to know • Score-based approach – Log-likelihood score • Use θ MLE • Information theoretic interpretation • Overfits! Adding edges only helps – Bayesian Score • Priors over structure and priors over parameters for a structure • If dirichlet closed form expression for P(D|G) • K2 dirichlet not enough; Need BDe for consistency • Structure Search – For trees • Chow-Liu: max-weight spanning tree • Can be extended to forests and TAN – General graphs • Heuristic Search • Efficiency tricks due to decomposable score (C) Dhruv Batra 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend