Divergence measures and message passing Tom Minka Microsoft - PowerPoint PPT Presentation

Divergence measures and message passing Tom Minka Microsoft Research Cambridge, UK with thanks to the Machine Learning and Perception Group 1

Message-Passing Algorithms MF [Peterson,Anderson 87] Mean-field BP [Frey,MacKay 97] Loopy belief propagation EP [Minka 01] Expectation propagation TRW [Wainwright,Jaakkola,Willsky Tree-reweighted message 03] passing FBP [Wiegerinck,Heskes 02] Fractional belief propagation PEP [Minka 04] Power EP 2

Outline • Example of message passing • Interpreting message passing • Divergence measures • Message passing from a divergence measure • Big picture 3

Estimation Problem b y d e x f z a c 5

Estimation Problem b 0 1 ? y d e 0 0 1 ? 1 ? x f z a c 6

Estimation Problem y x z 7

Estimation Problem Queries: Want to do these quickly 8

Belief Propagation y x z 9

Belief Propagation Final y x z 10

Belief Propagation Marginals: (Exact) (BP) Normalizing constant: 0.45 (Exact) 0.44 (BP) Argmax: (0,0,0) (Exact) (0,0,0) (BP) 11

Message Passing = Distributed Optimization • Messages represent a simpler distribution � � � � that approximates � � � � – A distributed representation • Message passing = optimizing � to fit � – � stands in for � when answering queries • Parameters: – What type of distribution to construct (approximating family) – What cost to minimize (divergence measure) 13

How to make a message-passing algorithm 1. Pick an approximating family • fully-factorized, Gaussian, etc. 2. Pick a divergence measure 3. Construct an optimizer for that measure • usually fixed-point iteration 4. Distribute the optimization across factors 14

Let p,q be unnormalized distributions Kullback-Leibler (KL) divergence Alpha-divergence ( α is any real number) Asymmetric, convex 16

Examples of alpha-divergence 17

Minimum alpha-divergence q is Gaussian, minimizes D α (p||q) α = - ∞ 18

Minimum alpha-divergence q is Gaussian, minimizes D α (p||q) α = 0 19

Minimum alpha-divergence q is Gaussian, minimizes D α (p||q) α = 0.5 20

Minimum alpha-divergence q is Gaussian, minimizes D α (p||q) α = 1 21

Minimum alpha-divergence q is Gaussian, minimizes D α (p||q) α = ∞ 22

Properties of alpha-divergence • α ≤ 0 seeks the mode with largest mass (not tallest) – zero-forcing : p(x)=0 forces q(x)=0 – underestimates the support of p • α ≥ 1 stretches to cover everything – inclusive : p(x)>0 forces q(x)>0 – overestimates the support of p [Frey,Patrascu,Jaakkola,Moran 00] 23

Structure of alpha space inclusive (zero zero avoiding) forcing BP, MF EP α 0 1 TRW FBP, PEP 24

Other properties • If q is an exact minimum of alpha-divergence: • Normalizing constant: • If α =1: Gaussian q matches mean,variance of p – Fully factorized q matches marginals of p 25

Two-node example x y • q is fully-factorized, minimizes α - divergence to p • q has correct marginals only for α = 1 (BP) 26

Two-node example Bimodal distribution Good Bad α = 1 (BP) •Zeros •Marginals •Peak •Mass heights α = 0 (MF) α ≤ 0.5 •Zeros •Marginals •One peak •Mass 27

Two-node example Bimodal distribution Good Bad α = ∞ •Zeros •Peak heights •Marginals 28

Lessons • Neither method is inherently superior – depends on what you care about • A factorized approx does not imply matching marginals (only for α =1) • Adding y to the problem can change the estimated marginal for x (though true marginal is unchanged) 29

Distributed divergence minimization 31

Distributed divergence minimization • Write p as product of factors: • Approximate factors one by one: • Multiply to get the approximation: 32

Global divergence to local divergence • Global divergence: • Local divergence: 33

Message passing • Messages are passed between factors • Messages are factor approximations: • Factor � receives – Minimize local divergence to get – Send to other factors – Repeat until convergence • Produces all 6 algs 34

Global divergence vs. local divergence MF 0 α local ≠ global local = global no loss from message passing In general, local ≠ global • but results are similar • BP doesn’t minimize global KL, but comes close 35

Experiment • Which message passing algorithm is best at minimizing global D α (p||q)? • Procedure: 1. Run FBP with various α L 2. Compute global divergence for various α G 3. Find best α L (best alg) for each α G 36

Results • Average over 20 graphs, random singleton and pairwise potentials: �� • Mixed potentials ( � ~ � (-1,1)): – best α L = α G (local should match global) – FBP with same α is best at minimizing D α • BP is best at minimizing KL 37

Hierarchy of algorithms Power EP • exp family • D α (p||q) Structured MF FBP EP • exp family • fully factorized • exp family • KL(q||p) • D α (p||q) • KL(p||q) MF TRW BP • fully factorized • fully factorized • fully factorized • D α (p||q), α >1 • KL(q||p) • KL(p||q) 39

Matrix of algorithms MF Structured MF • fully factorized • exp family • KL(q||p) • KL(q||p) TRW • fully factorized approximation family Other families? • D α (p||q), α >1 (mixtures) divergence measure BP EP • fully factorized • exp family • KL(p||q) • KL(p||q) FBP Power EP • fully factorized • exp family • D α (p||q) • D α (p||q) Other divergences? 40

Other Message Passing Algorithms Do they correspond to divergence measures? • Generalized belief propagation [Yedidia,Freeman,Weiss 00] • Iterated conditional modes [Besag 86] • Max-product belief revision • TRW-max-product [Wainwright,Jaakkola,Willsky 02] • Laplace propagation [Smola,Vishwanathan,Eskin 03] • Penniless propagation [Cano,Moral,Salmerón 00] • Bound propagation [Leisink,Kappen 03] 41

Future work • Understand existing message passing algorithms • Understand local vs. global divergence • New message passing algorithms: – Specialized divergence measures – Richer approximating families • Other ways to minimize divergence 42

Divergence measures and message passing Tom Minka Microsoft - PowerPoint PPT Presentation

Divergence measures and message passing Tom Minka Microsoft Research Cambridge, UK with thanks to the Machine Learning and Perception Group 1 Message-Passing Algorithms MF [Peterson,Anderson 87] Mean-field BP [Frey,MacKay 97] Loopy

COMP31212: Concurrency Topics 4.3: Message Passing Topic 4.3: Message Passing Outline Topic

Message Passing Concepts Message Passing Model The message passing model is based on the

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Message-Passing Programming with MPI Message-Passing Concepts Overview This lecture will

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

Interference Alignment via Message-Passing Message-Passing M. Guillaud Motivation Maxime

Distributed Objects Message Passing vs. Distributed Objects Message Passing versus Distributed

Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in Action and in

Message passing and channels INF4140 - Models of concurrency Message passing and channels Fall

Message Passing Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

Message Passing Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

+ Design of Parallel Algorithms Introduction to the Message Passing Interface MPI + Principles

Lecture 5: Message Passing & Other Communication Mechanisms (SR & Java) Intro:

A little introduction to MPI Jean-Luc Falcone July 2017 Message Passing Basics Point to point

c p e c Writing Message-Passing Parallel Programs with MPI Edinburgh Parallel Computing Centre

CS 331: Artificial Intelligence function MAX-VALUE( state , , ) returns a utility value

| | | 4 x i 1 x i +1 x i large seeds small seeds

Shape optimization under uncertainty Rahel Br ugger, Roberto Croce, Marc Dambrine, Charles

Smoke and Mirrors the Magic behind wonderful UI in Android Israel Ferrer Camacho @rallat Smoke

Phylogenetics: Distance Methods COMP 571 Luay Nakhleh, Rice University Outline Evolutionary

+ Radius = 695,000 km Radius = 695,000 km Distance = 149,600,000 km

2014 CURRENT ISSUES IN PATHOLOGY SPECIAL STAINS IN LIVER BIOPSY PATHOLOGY Sanjay Kakar, MD

BONE & JOINT INFECTIONS Henry F. Chambers, MD I have nothing to disclose SEPTIC ARTHRITIS

Divergence measures and message passing Tom Minka Microsoft - PowerPoint PPT Presentation

Divergence measures and message passing Tom Minka Microsoft Research Cambridge, UK with thanks to the Machine Learning and Perception Group 1 Message-Passing Algorithms MF [Peterson,Anderson 87] Mean-field BP [Frey,MacKay 97] Loopy

COMP31212: Concurrency Topics 4.3: Message Passing Topic 4.3: Message Passing Outline Topic

Message Passing Concepts Message Passing Model The message passing model is based on the

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Message-Passing Programming with MPI Message-Passing Concepts Overview This lecture will

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

Interference Alignment via Message-Passing Message-Passing M. Guillaud Motivation Maxime

Distributed Objects Message Passing vs. Distributed Objects Message Passing versus Distributed

Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in Action and in

Message passing and channels INF4140 - Models of concurrency Message passing and channels Fall

Message Passing Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

Message Passing Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

+ Design of Parallel Algorithms Introduction to the Message Passing Interface MPI + Principles

Lecture 5: Message Passing &amp; Other Communication Mechanisms (SR &amp; Java) Intro:

A little introduction to MPI Jean-Luc Falcone July 2017 Message Passing Basics Point to point

c p e c Writing Message-Passing Parallel Programs with MPI Edinburgh Parallel Computing Centre

CS 331: Artificial Intelligence function MAX-VALUE( state , , ) returns a utility value

| | | 4 x i 1 x i +1 x i large seeds small seeds

Shape optimization under uncertainty Rahel Br ugger, Roberto Croce, Marc Dambrine, Charles

Smoke and Mirrors the Magic behind wonderful UI in Android Israel Ferrer Camacho @rallat Smoke

Phylogenetics: Distance Methods COMP 571 Luay Nakhleh, Rice University Outline Evolutionary

+ Radius = 695,000 km Radius = 695,000 km Distance = 149,600,000 km

2014 CURRENT ISSUES IN PATHOLOGY SPECIAL STAINS IN LIVER BIOPSY PATHOLOGY Sanjay Kakar, MD

BONE &amp; JOINT INFECTIONS Henry F. Chambers, MD I have nothing to disclose SEPTIC ARTHRITIS

Lecture 5: Message Passing & Other Communication Mechanisms (SR & Java) Intro:

BONE & JOINT INFECTIONS Henry F. Chambers, MD I have nothing to disclose SEPTIC ARTHRITIS