ECE 6504: Advanced Topics in Machine Learning Probabilistic - PowerPoint PPT Presentation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics – Bayes Nets – (Finish) Structure Learning Readings: KF 18.4; Barber 9.5, 10.4 Dhruv Batra Virginia Tech

Administrativia • HW1 – Out – Due in 2 weeks: Feb 17, Feb 19, 11:59pm – Please please please please start early – Implementation: TAN, structure + parameter learning – Please post questions on Scholar Forum. (C) Dhruv Batra 2

Recap of Last Time (C) Dhruv Batra 3

Learning Bayes nets Known structure Unknown structure Fully observable Very easy Hard data Missing data Somewhat easy Very very hard (EM) Data CPTs – x (1) P(X i | Pa Xi ) … x (m) structure parameters (C) Dhruv Batra Slide Credit: Carlos Guestrin 4

Types of Errors • Truth: Flu Allergy Sinus Nose Headache • Recovered: Flu Flu Allergy Allergy Sinus Sinus Nose Nose Headache Headache (C) Dhruv Batra 5

Score-based approach Possible structures Data Flu Allergy Score structure Learn parameters Sinus -52 Nose Headache <x 1 (1) , … ,x n (1) > … <x 1 (m) , … ,x n (m) > Flu Allergy Score structure Learn parameters Sinus -60 Nose Headache Flu Allergy Score structure Learn parameters Sinus -500 Nose Headache (C) Dhruv Batra Slide Credit: Carlos Guestrin 6

How many graphs? • N vertices. • How many (undirected) graphs? • How many (undirected) trees? (C) Dhruv Batra 7

What’s a good score? • Score(G) = log-likelihood(G : D, θ MLE ) = logP(D | θ MLE , G) (C) Dhruv Batra 8

Information-theoretic interpretation of Maximum Likelihood Score Flu Allergy Sinus Nose Headache • Implications: – Intuitive: higher mutual info à higher score – Decomposes over families in BN (node and it’s parents) – Same score for I-equivalent structures! (C) Dhruv Batra 9

Log-Likelihood Score Overfits Flu Allergy Sinus Nose Headache • Adding an edge only improves score! – Thus, MLE = complete graph • Two fixes: – Restrict space of graphs • say only d parents allowed (d=1 à trees) – Put priors on graphs • Prefer sparser graphs (C) Dhruv Batra 10

Chow-Liu tree learning algorithm 1 • For each pair of variables X i ,X j – Compute empirical distribution: – Compute mutual information: • Define a graph – Nodes X 1 , … ,X n – Edge (i,j) gets weight (C) Dhruv Batra Slide Credit: Carlos Guestrin 11

Chow-Liu tree learning algorithm 2 • Optimal tree BN – Compute maximum weight spanning tree – Directions in BN: pick any node as root, and direct edges away from root • breadth-first-search defines directions (C) Dhruv Batra Slide Credit: Carlos Guestrin 12

Can we extend Chow-Liu? • Tree augmented naïve Bayes (TAN) [Friedman et al. ’ 97] – Naïve Bayes model overcounts, because correlation between features not considered – Same as Chow-Liu, but score edges with: (C) Dhruv Batra Slide Credit: Carlos Guestrin 13

Plan for today • (Finish) BN Structure Learning – Bayesian Score – Heuristic Search – Efficient tricks with decomposable scores (C) Dhruv Batra 14

Bayesian score • Bayesian view à Prior distributions: – Over structures – Over parameters of a structure • Posterior over structures given data: (C) Dhruv Batra 15

Structure Prior • Common choices: – Uniform: P( G ) α c – Sparsity prior: P( G ) α c | G | – Prior penalizing number of parameters – P(G) should decompose like the family score (C) Dhruv Batra 16

Parameter Prior and Integrals • Important Result: – If P( θ G | G) is Dirichlet, then integral has closed form! – And it factorizes according to families in G P ( D | G ) Dirichlet marginal likelihood ∏∏ = for multinomial P(X i | pa i ) pa G i i ( ) ( pa G ) ( ( x , pa G ) N ( x , pa G )) Γ α Γ α + i i i ) ∏ i i ( ( pa ) N ( pa ) ( ( x , pa )) G G G Γ α + Γ α x i i i i i (C) Dhruv Batra 17

Parameter Prior and Integrals • How should we choose Dirichlet hyperparameters? – K2 prior : fix an α , P( θ Xi| Pa Xi ) = Dirichlet( α , … , α ) • K2 is “inconsistent” (C) Dhruv Batra 18

BDe Prior • BDe Prior – Remember that Dirichlet parameters are analogous to “ fictitious samples ” – Pick a fictitious sample size m ’ – Pick a “prior” BN • Usually independent (product of marginals) – Compute P(X i , Pa Xi ) under this prior BN • BDe prior : • Has consistency property (C) Dhruv Batra 19

Chow-Liu for Bayesian score • Edge weight w Xj à Xi is advantage of adding X j as parent for X i • Now have a directed graph, need directed spanning forest – Note that adding an edge can hurt Bayesian score – choose forest not tree – Maximum spanning forest algorithm works (C) Dhruv Batra 20

Structure learning for general graphs • In a tree, a node only has one parent • Theorem : – The problem of learning a BN structure with at most d parents is NP-hard for any (fixed) d ≥ 2 • Most structure learning approaches use heuristics – Exploit score decomposition – (Quickly) Describe two heuristics that exploit decomposition in different ways (C) Dhruv Batra 21

Structure learning using local search Starting from Local search, Select using Chow-Liu tree possible moves: favorite score Only if acyclic!!! • Add edge • Delete edge • Invert edge (C) Dhruv Batra 22

Structure learning using local search • Problems: – Local maximum – Plateau • Strategies – Random restart – Tabu list (C) Dhruv Batra 23

Exploit score decomposition in local search Flu Allergy • Add edge and delete edge: Sinus – Only rescore one family! Nose Headache Local Move Add Edge • Reverse edge – Rescore only two families Flu Allergy Sinus Nose Headache (C) Dhruv Batra 24

Example Alarm network (C) Dhruv Batra 25

Example • JamBayes [Horvitz et al UAI05] (C) Dhruv Batra 26

Example • JamBayes [Horvitz et al UAI05] (C) Dhruv Batra 27

Example • JamBayes [Horvitz et al] (C) Dhruv Batra 28

Bayesian model averaging • So far, we have selected a single structure • But, if you are really Bayesian, must average over structures – Similar to averaging over parameters

BN: Structure Learning: What you need to know • Score-based approach – Log-likelihood score • Use θ MLE • Information theoretic interpretation • Overfits! Adding edges only helps – Bayesian Score • Priors over structure and priors over parameters for a structure • If dirichlet closed form expression for P(D|G) • K2 dirichlet not enough; Need BDe for consistency • Structure Search – For trees • Chow-Liu: max-weight spanning tree • Can be extended to forests and TAN – General graphs • Heuristic Search • Efficiency tricks due to decomposable score (C) Dhruv Batra 30

ECE 6504: Advanced Topics in Machine Learning Probabilistic - PowerPoint PPT Presentation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets (Finish) Structure Learning Readings: KF 18.4; Barber 9.5, 10.4 Dhruv Batra Virginia Tech Administrativia HW1

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Deep Learning for Perception Topics: LSTMs (intuition and variants) [Abhishek:]

ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural Networks (RNNs) BackProp

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech ECE 4424 / 5424G (CS

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J - - Advanced Topics in Advanced Topics in ECE 697J Computer Networks Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

We are not interested in prescribing how games should be played. We are interested in analysing how

Modelling and Control of Dynamic Systems Stability of Linear Systems Sven Laur University of

A short introduction to GeoGebra automated reasoning tools (ART) M Pilar Vlez Meln

Open Daylight Tutorial For Developers February 2014 Thomas D. Nadeau, Brocade

On an EAV Based Approach to Designing of Medical Data Model for Mobile HealthCare Service This

The Power of Contracts for Institutional Procurement of Local Food With Health Care Without Harm,

Randomized comparison of a novel, ultrathin cobalt-chromium biodegradable polymer sirolimus-

Bern-Rotterdam Cohort Study Newer generation everolimus-eluting stents eliminate the risk of very