Optimization Nicholas Ruozzi Advisor: Sekhar Tatikonda Yale - - PowerPoint PPT Presentation

optimization
SMART_READER_LITE
LIVE PREVIEW

Optimization Nicholas Ruozzi Advisor: Sekhar Tatikonda Yale - - PowerPoint PPT Presentation

Message Passing Algorithms for Optimization Nicholas Ruozzi Advisor: Sekhar Tatikonda Yale University 1 The Problem Minimize a real-valued objective function that factorizes as a sum of potentials (a multiset whose elements are


slide-1
SLIDE 1

Message Passing Algorithms for Optimization

Nicholas Ruozzi Advisor: Sekhar Tatikonda Yale University

1

slide-2
SLIDE 2

The Problem

 Minimize a real-valued objective function that

factorizes as a sum of potentials

 (a multiset whose elements are subsets of

the indices 1,…,n)

2

slide-3
SLIDE 3

3

Corresponding Graph

2 1 3

slide-4
SLIDE 4

Local Message Passing Algorithms

 Pass messages on this graph to minimize f

 Distributed message passing algorithm  Ideal for large scientific problems, sensor networks, etc.

4

2 1 3

slide-5
SLIDE 5

The Min-Sum Algorithm

 Messages at time t:

5

2 1 3 4

slide-6
SLIDE 6

Computing Beliefs

 The min-marginal corresponding to the ith variable is

given by

 Beliefs approximate the min-marginals:  Estimate the optimal assignment as

6

slide-7
SLIDE 7

Min-Sum: Convergence Properties

 Iterations do not necessarily converge

 Always converges when the factor graph is a tree

 Converged estimates need not correspond to the optimal

solution

 Performs well empirically

7

slide-8
SLIDE 8

Previous Work

 Prior work focused on two aspects of message passing

algorithms

 Convergence

 Coordinate ascent schemes  Not necessarily local message passing algorithms

 Correctness

 No combinatorial characterization of failure modes  Concerned only with global optimality

8

slide-9
SLIDE 9

Contributions

 A new local message passing algorithm

 Parameterized family of message passing algorithms  Conditions under which the estimate produced by the splitting

algorithm is guaranteed to be a global optima

 Conditions under which the estimate produced by the splitting

algorithm is guaranteed to be a local optima

9

slide-10
SLIDE 10

Contributions

 What makes a graphical model “good”?

 Combinatorial understanding of the failure modes of the

splitting algorithm via graph covers

 Can be extended to other iterative algorithms

 T

echniques for handling objective functions for which the known convergent algorithms fail

 Reparameterization centric approach

10

slide-11
SLIDE 11

Publications

Convergent and correct message passing schemes for optimization problems

  • ver graphical models

Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), July 2010

Fixing Max-Product: A Unified Look at Message Passing Algorithms (invited talk) Proceedings of the Forty-Eighth Annual Allerton Conference on Communication, Control, and Computing, September 2010

Unconstrained minimization of quadratic functions via min-sum Proceedings of the Conference on Information Sciences and Systems (CISS), Princeton, NJ/USA, March 2010

Graph covers and quadratic minimization Proceedings of the Forty-Seventh Annual Allerton Conference on Communication, Control, and Computing, September 2009

s-t paths using the min-sum algorithm Proceedings of the Forty-Sixth Annual Allerton Conference on Communication, Control, and Computing, September 2008

11

slide-12
SLIDE 12

Outline

12

 Reparameterizations

 Lower Bounds  Convergent Message Passing

 Finding a Minimizing Assignment

 Graph covers  Quadratic Minimization

slide-13
SLIDE 13

The Problem

 Minimize a real-valued objective function that

factorizes as a sum of potentials

 (a multiset whose elements are subsets of

the indices 1,…,n)

13

slide-14
SLIDE 14

Factorizations

14

 Some factorizations are better than others

 If xi takes one of k values this requires at most 2k2 + k

  • perations
slide-15
SLIDE 15

Factorizations

15

 Some factorizations are better than others

 Suppose

 Only need k operations to compute the minimum value!

slide-16
SLIDE 16

Reparameterizations

 We can rewrite the objective function as

 This does not change the objective function as long as the

messages are real-valued at each x

 The objective function is reparameterized in terms of the

messages

16

slide-17
SLIDE 17

Reparameterizations

 We can rewrite the objective function as

 The reparameterization has the same factor graph as the

  • riginal factorization

 Many message passing algorithms produce a

reparameterization upon convergence

17

slide-18
SLIDE 18

The Splitting Reparameterization

 Let c be a vector of non-zero reals  If c is a vector of positive integers, then we could view

this as a factorization in two ways:

 Over the same factor graph as the original potentials  Over a factor graph where each potential has been “split” into

several pieces

18

slide-19
SLIDE 19

The Splitting Reparameterization

2 1 3 2 1 3

Factor graph

Factor graph resulting from “splitting” each of the pairwise potentials 3 times

19

slide-20
SLIDE 20

The Splitting Reparameterization

 Beliefs:  Reparameterization:

20

slide-21
SLIDE 21

Outline

21

 Reparameterizations

 Lower Bounds  Convergent Message Passing

 Finding a Minimizing Assignment

 Graph covers  Quadratic Minimization

slide-22
SLIDE 22

Lower Bounds

 Can lower bound the objective function with these

reparameterizations:

 Find the collection of messages that maximize this lower

bound

 Lower bound is a concave function of the messages

 Use coordinate ascent or subgradient methods

22

slide-23
SLIDE 23

Lower Bounds and the MAP LP

 Equivalent to minimizing f  Dual provides a lower bound on f

 Messages are a side-effect of certain dual formulations

23

slide-24
SLIDE 24

Outline

24

 Reparameterizations

 Lower Bounds  Convergent Message Passing

 Finding a Minimizing Assignment

 Graph covers  Quadratic Minimization

slide-25
SLIDE 25

The Splitting Algorithm

 A local message passing algorithm for the splitting

reparameterization

 Contains the min-sum algorithm as a special case  For the integer case, can be derived from the min-sum update

equations

25

slide-26
SLIDE 26

The Splitting Algorithm

 For certain choices of c, an asynchronous version of the

splitting algorithm can be shown to be a block coordinate ascent scheme for the lower bound:

 For example:

26

slide-27
SLIDE 27

Asynchronous Splitting Algorithm

27

2 1 3

slide-28
SLIDE 28

Asynchronous Splitting Algorithm

28

2 1 3

slide-29
SLIDE 29

Asynchronous Splitting Algorithm

29

2 1 3

slide-30
SLIDE 30

Coordinate Ascent

 Guaranteed to converge  Does not necessarily maximize the lower bound

 Can get stuck in a suboptimal configuration  Can be shown to converge to the maximum in restricted cases

 Pairwise-binary objective functions

30

slide-31
SLIDE 31

Other Ascent Schemes

 Many other ascent algorithms are possible over different

lower bounds:

 TRW-S [Kolmogorov 2007]  MPLP [Globerson and Jaakkola 2007]  Max-Sum Diffusion [Werner 2007]  Norm-product [Hazan 2010]

 Not all coordinate ascent schemes are local

31

slide-32
SLIDE 32

Outline

32

 Reparameterizations

 Lower Bounds  Convergent Message Passing

 Finding a Minimizing Assignment

 Graph covers  Quadratic Minimization

slide-33
SLIDE 33

Constructing the Solution

 Construct an estimate, x*, of the optimal assignment from

the beliefs by choosing

 For certain choices of the vector c, if each argmin is

unique, then x* minimizes f

 A simple choice of c guarantees both convergence and

correctness (if the argmins are unique)

33

slide-34
SLIDE 34

Correctness

34

 If the argmins are not unique, then we may not be able to

construct a solution

 When does the algorithm converge to the correct

minimizing assignment?

slide-35
SLIDE 35

Outline

35

 Reparameterizations

 Lower Bounds  Convergent Message Passing

 Finding a Minimizing Assignment

 Graph covers  Quadratic Minimization

slide-36
SLIDE 36

Graph Covers

 A graph H covers a graph G if there is homomorphism

from H to G that is a bijection on neighborhoods Graph G 2-cover of G

36

2 1 3 2 1 3 3’ 2’ 1’

slide-37
SLIDE 37

Graph Covers

 Potential functions are “lifts” of the nodes they cover

Graph G 2-cover of G

37

2 1 3 2 1 3 3’ 2’ 1’

slide-38
SLIDE 38

Graph Covers

 The lifted potentials define a new objective function

 Objective function:  2-cover objective function

38

slide-39
SLIDE 39

Graph Covers

 Indistinguishability: for any cover and any choice of initial

messages on the original graph, there exists a choice of initial messages on the cover such that the messages passed by the splitting algorithm are identical on both graphs

 For choices of c that guarantee correctness, any

assignment that uniquely minimizes each must also minimize the objective function corresponding to any finite cover

39

slide-40
SLIDE 40

Maximum Weight Independent Set

1 2 3 2 1 3 3’ 2’ 1’

Graph G 2-cover of G

40

slide-41
SLIDE 41

Maximum Weight Independent Set

5 2 2 2 5 2 2 2 5

Graph G 2-cover of G

41

slide-42
SLIDE 42

Maximum Weight Independent Set

5 2 2 2 5 2 2 2 5

Graph G 2-cover of G

42

slide-43
SLIDE 43

Maximum Weight Independent Set

3 2 2 2 3 2 2 2 3

Graph G 2-cover of G

43

slide-44
SLIDE 44

Maximum Weight Independent Set

3 2 2 2 3 2 2 2 3

Graph G 2-cover of G

44

slide-45
SLIDE 45

More Graph Covers

 If covers of the factor graph have different solutions

 The splitting algorithm cannot converge to the correct answer

for choices of c that guarantee correctness

 The min-sum algorithm may converge to an assignment that is

  • ptimal on a cover

 There are applications for which the splitting algorithm

always works

 Minimum cuts, shortest paths, and more…

45

slide-46
SLIDE 46

Graph Covers

 Suppose f factorizes over a set with corresponding

factor graph G and the choice of c guarantees correctness

 Theorem: the splitting algorithm can only converge to

beliefs that have unique argmins if

 f is uniquely minimized at the assignment x*  The objective function corresponding to every finite cover H

  • f G has a unique minimum that is a lift of x*

46

slide-47
SLIDE 47

Graph Covers

 This result suggests that

 There is a close link between “good” factorizations and the

difficulty of a problem

 Convergent and correct algorithms are not ideal for all

applications

 Convex functions can be covered by functions that are not convex

47

slide-48
SLIDE 48

Outline

48

 Reparameterizations

 Lower Bounds  Convergent Message Passing

 Finding a Minimizing Assignment

 Graph covers  Quadratic Minimization

slide-49
SLIDE 49

49

Quadratic Minimization

 symmetric positive definite implies a unique minimum  Minimized at

slide-50
SLIDE 50

 For a positive definite matrix, min-sum convergence

implies a correct solution:

 Min-sum is not guaranteed to converge for all symmetric

positive definite matrices

50

Quadratic Minimization

slide-51
SLIDE 51

51

Quadratic Minimization

 A symmetric matrix is scaled diagonally dominant if there

exists w > 0 such that for each row i:

 Theorem: ¡ is scaled diagonally iff every finite cover of ¡

is positive definite

slide-52
SLIDE 52

Quadratic Minimization

52

 Scaled diagonal dominance is a sufficient condition for the

convergence of other iterative methods

 Gauss-Seidel, Jacobi, and min-sum

 Suggests a generalization of scaled diagonal dominance for

arbitrary convex functions

 Purely combinatorial!

 Empirically, the splitting algorithm can always be made to

converge for this problem

slide-53
SLIDE 53

Conclusion

53

 General strategy for minimization

 Reparameterization  Lower bounds  Convergent and correct message passing algorithms

 Correctness is too strong

 Algorithms cannot distinguish graph covers  Can fail to hold even for convex problems

slide-54
SLIDE 54

Conclusion

54

 Open questions

 Deep relationship between “hardness” of a problem and its

factorizations

 Convergence and correctness criteria for the min-sum

algorithm

 Rates of convergence

slide-55
SLIDE 55

Questions?

A draft of the thesis is available online at: http://cs-www.cs.yale.edu/homes/nruozzi/Papers/ths2.pdf

55