[PPT] - Optimization Nicholas Ruozzi Advisor: Sekhar Tatikonda Yale PowerPoint Presentation

SLIDE 1

Message Passing Algorithms for Optimization

Nicholas Ruozzi Advisor: Sekhar Tatikonda Yale University

1

SLIDE 2

The Problem

 Minimize a real-valued objective function that

factorizes as a sum of potentials

 (a multiset whose elements are subsets of

the indices 1,…,n)

2

SLIDE 3

3

Corresponding Graph

2 1 3

SLIDE 4

Local Message Passing Algorithms

 Pass messages on this graph to minimize f

 Distributed message passing algorithm  Ideal for large scientific problems, sensor networks, etc.

4

2 1 3

SLIDE 5

The Min-Sum Algorithm

 Messages at time t:

5

2 1 3 4

SLIDE 6

Computing Beliefs

 The min-marginal corresponding to the ith variable is

given by

 Beliefs approximate the min-marginals:  Estimate the optimal assignment as

6

SLIDE 7

Min-Sum: Convergence Properties

 Iterations do not necessarily converge

 Always converges when the factor graph is a tree

 Converged estimates need not correspond to the optimal

solution

 Performs well empirically

7

SLIDE 8

Previous Work

 Prior work focused on two aspects of message passing

algorithms

 Convergence

 Coordinate ascent schemes  Not necessarily local message passing algorithms

 Correctness

 No combinatorial characterization of failure modes  Concerned only with global optimality

8

SLIDE 9

Contributions

 A new local message passing algorithm

 Parameterized family of message passing algorithms  Conditions under which the estimate produced by the splitting

algorithm is guaranteed to be a global optima

 Conditions under which the estimate produced by the splitting

algorithm is guaranteed to be a local optima

9

SLIDE 10

Contributions

 What makes a graphical model “good”?

 Combinatorial understanding of the failure modes of the

splitting algorithm via graph covers

 Can be extended to other iterative algorithms

 T

echniques for handling objective functions for which the known convergent algorithms fail

 Reparameterization centric approach

10

SLIDE 11

Publications



Convergent and correct message passing schemes for optimization problems

ver graphical models

Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), July 2010



Fixing Max-Product: A Unified Look at Message Passing Algorithms (invited talk) Proceedings of the Forty-Eighth Annual Allerton Conference on Communication, Control, and Computing, September 2010



Unconstrained minimization of quadratic functions via min-sum Proceedings of the Conference on Information Sciences and Systems (CISS), Princeton, NJ/USA, March 2010



Graph covers and quadratic minimization Proceedings of the Forty-Seventh Annual Allerton Conference on Communication, Control, and Computing, September 2009



s-t paths using the min-sum algorithm Proceedings of the Forty-Sixth Annual Allerton Conference on Communication, Control, and Computing, September 2008

11

SLIDE 12

Outline

12

 Reparameterizations

 Lower Bounds  Convergent Message Passing

 Finding a Minimizing Assignment

 Graph covers  Quadratic Minimization

SLIDE 13

The Problem

 Minimize a real-valued objective function that

factorizes as a sum of potentials

 (a multiset whose elements are subsets of

the indices 1,…,n)

13

SLIDE 14

Factorizations

14

 Some factorizations are better than others

 If xi takes one of k values this requires at most 2k2 + k

perations

SLIDE 15

Factorizations

15

 Some factorizations are better than others

 Suppose

 Only need k operations to compute the minimum value!

SLIDE 16

Reparameterizations

 We can rewrite the objective function as

 This does not change the objective function as long as the

messages are real-valued at each x

 The objective function is reparameterized in terms of the

messages

16

SLIDE 17

Reparameterizations

 We can rewrite the objective function as

 The reparameterization has the same factor graph as the

riginal factorization

 Many message passing algorithms produce a

reparameterization upon convergence

17

SLIDE 18

The Splitting Reparameterization

 Let c be a vector of non-zero reals  If c is a vector of positive integers, then we could view

this as a factorization in two ways:

 Over the same factor graph as the original potentials  Over a factor graph where each potential has been “split” into

several pieces

18

SLIDE 19

The Splitting Reparameterization

2 1 3 2 1 3

Factor graph

Factor graph resulting from “splitting” each of the pairwise potentials 3 times

19

SLIDE 20

The Splitting Reparameterization

 Beliefs:  Reparameterization:

20

SLIDE 21

Outline

21

 Reparameterizations

 Lower Bounds  Convergent Message Passing

 Finding a Minimizing Assignment

 Graph covers  Quadratic Minimization

SLIDE 22

Lower Bounds

 Can lower bound the objective function with these

reparameterizations:

 Find the collection of messages that maximize this lower

bound

 Lower bound is a concave function of the messages

 Use coordinate ascent or subgradient methods

22

SLIDE 23

Lower Bounds and the MAP LP

 Equivalent to minimizing f  Dual provides a lower bound on f

 Messages are a side-effect of certain dual formulations

23

SLIDE 24

Outline

24

 Reparameterizations

 Lower Bounds  Convergent Message Passing

 Finding a Minimizing Assignment

 Graph covers  Quadratic Minimization

SLIDE 25

The Splitting Algorithm

 A local message passing algorithm for the splitting

reparameterization

 Contains the min-sum algorithm as a special case  For the integer case, can be derived from the min-sum update

equations

25

SLIDE 26

The Splitting Algorithm

 For certain choices of c, an asynchronous version of the

splitting algorithm can be shown to be a block coordinate ascent scheme for the lower bound:

 For example:

26

SLIDE 27

Asynchronous Splitting Algorithm

27

2 1 3

SLIDE 28

Asynchronous Splitting Algorithm

28

2 1 3

SLIDE 29

Asynchronous Splitting Algorithm

29

2 1 3

SLIDE 30

Coordinate Ascent

 Guaranteed to converge  Does not necessarily maximize the lower bound

 Can get stuck in a suboptimal configuration  Can be shown to converge to the maximum in restricted cases

 Pairwise-binary objective functions

30

SLIDE 31

Other Ascent Schemes

 Many other ascent algorithms are possible over different

lower bounds:

 TRW-S [Kolmogorov 2007]  MPLP [Globerson and Jaakkola 2007]  Max-Sum Diffusion [Werner 2007]  Norm-product [Hazan 2010]

 Not all coordinate ascent schemes are local

31

SLIDE 32

Outline

32

 Reparameterizations

 Lower Bounds  Convergent Message Passing

 Finding a Minimizing Assignment

 Graph covers  Quadratic Minimization

SLIDE 33

Constructing the Solution

 Construct an estimate, x*, of the optimal assignment from

the beliefs by choosing

 For certain choices of the vector c, if each argmin is

unique, then x* minimizes f

 A simple choice of c guarantees both convergence and

correctness (if the argmins are unique)

33

SLIDE 34

Correctness

34

 If the argmins are not unique, then we may not be able to

construct a solution

 When does the algorithm converge to the correct

minimizing assignment?

SLIDE 35

Outline

35

 Reparameterizations

 Lower Bounds  Convergent Message Passing

 Finding a Minimizing Assignment

SLIDE 39

Graph Covers

 Indistinguishability: for any cover and any choice of initial

More Graph Covers

 If covers of the factor graph have different solutions

 The splitting algorithm cannot converge to the correct answer

for choices of c that guarantee correctness

 The min-sum algorithm may converge to an assignment that is

ptimal on a cover

 There are applications for which the splitting algorithm

always works

 Minimum cuts, shortest paths, and more…

45

SLIDE 46

Graph Covers

 Suppose f factorizes over a set with corresponding

factor graph G and the choice of c guarantees correctness

 Theorem: the splitting algorithm can only converge to

beliefs that have unique argmins if

 f is uniquely minimized at the assignment x*  The objective function corresponding to every finite cover H

f G has a unique minimum that is a lift of x*

46

SLIDE 47

Graph Covers

 This result suggests that

 There is a close link between “good” factorizations and the

difficulty of a problem

 Convergent and correct algorithms are not ideal for all

applications

 Convex functions can be covered by functions that are not convex

47

SLIDE 48

Outline

48

 Reparameterizations

 Lower Bounds  Convergent Message Passing

 Finding a Minimizing Assignment

 Graph covers  Quadratic Minimization

SLIDE 49

49

Quadratic Minimization

 symmetric positive definite implies a unique minimum  Minimized at

SLIDE 50

 For a positive definite matrix, min-sum convergence

implies a correct solution:

 Min-sum is not guaranteed to converge for all symmetric

positive definite matrices

50

Quadratic Minimization

SLIDE 51

51

Quadratic Minimization

 A symmetric matrix is scaled diagonally dominant if there

exists w > 0 such that for each row i:

 Theorem: ¡ is scaled diagonally iff every finite cover of ¡

is positive definite

SLIDE 52

Quadratic Minimization

52

 Scaled diagonal dominance is a sufficient condition for the

convergence of other iterative methods

 Gauss-Seidel, Jacobi, and min-sum

 Suggests a generalization of scaled diagonal dominance for

arbitrary convex functions

 Purely combinatorial!

 Empirically, the splitting algorithm can always be made to

converge for this problem

SLIDE 53

Conclusion

53

 General strategy for minimization

 Reparameterization  Lower bounds  Convergent and correct message passing algorithms

 Correctness is too strong

 Algorithms cannot distinguish graph covers  Can fail to hold even for convex problems

SLIDE 54

Conclusion

54

 Open questions

 Deep relationship between “hardness” of a problem and its

factorizations

 Convergence and correctness criteria for the min-sum

algorithm

 Rates of convergence

SLIDE 55

Questions?

A draft of the thesis is available online at: http://cs-www.cs.yale.edu/homes/nruozzi/Papers/ths2.pdf

55