ECE 6504: Advanced Topics in Machine Learning
Probabilistic Graphical Models and Large-Scale Learning
Dhruv Batra Virginia Tech
Topics
– Bayes Nets – (Finish) Structure Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic - - PowerPoint PPT Presentation
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets (Finish) Structure Learning Readings: KF 18.4; Barber 9.5, 10.4 Dhruv Batra Virginia Tech Administrativia HW1
– Bayes Nets – (Finish) Structure Learning
– Out – Due in 2 weeks: Feb 17, Feb 19, 11:59pm – Please please please please start early – Implementation: TAN, structure + parameter learning – Please post questions on Scholar Forum.
(C) Dhruv Batra 2
(C) Dhruv Batra 3
x(1) … x(m)
structure parameters
(C) Dhruv Batra 4 Slide Credit: Carlos Guestrin
(C) Dhruv Batra 5
Flu Allergy Sinus Headache Nose Flu Allergy Sinus Headache Nose Flu Allergy Sinus Headache Nose
<x1
(1),…,xn (1)>
… <x1
(m),…,xn (m)>
Flu Allergy Sinus Headache Nose
Possible structures Score structure
Learn parameters
(C) Dhruv Batra 6 Slide Credit: Carlos Guestrin
Flu Allergy Sinus Headache Nose
Score structure
Learn parameters
Flu Allergy Sinus Headache Nose
Score structure
Learn parameters
(C) Dhruv Batra 7
(C) Dhruv Batra 8
– Intuitive: higher mutual info à higher score – Decomposes over families in BN (node and it’s parents) – Same score for I-equivalent structures!
(C) Dhruv Batra 9
Flu Allergy Sinus Headache Nose
– Thus, MLE = complete graph
– Restrict space of graphs
– Put priors on graphs
(C) Dhruv Batra 10
Flu Allergy Sinus Headache Nose
– Compute empirical distribution: – Compute mutual information:
– Nodes X1,…,Xn – Edge (i,j) gets weight
(C) Dhruv Batra 11 Slide Credit: Carlos Guestrin
– Compute maximum weight spanning tree – Directions in BN: pick any node as root, and direct edges away from root
(C) Dhruv Batra 12 Slide Credit: Carlos Guestrin
– Naïve Bayes model overcounts, because correlation between features not considered – Same as Chow-Liu, but score edges with:
(C) Dhruv Batra 13 Slide Credit: Carlos Guestrin
– Bayesian Score – Heuristic Search – Efficient tricks with decomposable scores
(C) Dhruv Batra 14
– Over structures – Over parameters of a structure
(C) Dhruv Batra 15
– Uniform: P(G) α c – Sparsity prior: P(G) α c|G| – Prior penalizing number of parameters – P(G) should decompose like the family score
(C) Dhruv Batra 16
– If P(θG | G) is Dirichlet, then integral has closed form! – And it factorizes according to families in G
(C) Dhruv Batra 17
i pa G
i
Γ + Γ + Γ Γ
i i i i i i i
x G i G i G i G G G
pa x pa x N pa x pa N pa pa )) , ( ( )) , ( ) , ( ( ) ( ) ( ) ( α α α α
– K2 prior: fix an α, P(θXi|PaXi) = Dirichlet(α,…, α)
(C) Dhruv Batra 18
– Remember that Dirichlet parameters are analogous to “fictitious samples” – Pick a fictitious sample size m’ – Pick a “prior” BN
– Compute P(Xi,PaXi) under this prior BN
(C) Dhruv Batra 19
– Note that adding an edge can hurt Bayesian score – choose forest not tree – Maximum spanning forest algorithm works
(C) Dhruv Batra 20
– The problem of learning a BN structure with at most d parents is NP-hard for any (fixed) d≥2
– Exploit score decomposition – (Quickly) Describe two heuristics that exploit decomposition in different ways
(C) Dhruv Batra 21
(C) Dhruv Batra 22
Starting from Chow-Liu tree
possible moves:
Only if acyclic!!!
– Local maximum – Plateau
– Random restart – Tabu list
(C) Dhruv Batra 23
– Only rescore one family!
– Rescore only two families
Flu Allergy Sinus Headache Nose Flu Allergy Sinus Headache Nose Local Move Add Edge
(C) Dhruv Batra 24
Alarm network
(C) Dhruv Batra 25
(C) Dhruv Batra 26
(C) Dhruv Batra 27
(C) Dhruv Batra 28
– Similar to averaging over parameters
– Log-likelihood score
– Bayesian Score
– For trees
– General graphs
(C) Dhruv Batra 30