rank optimality for the burer monteiro factorization
play

Rank optimality for the Burer-Monteiro factorization Ir` ene - PowerPoint PPT Presentation

Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE (Universit e Paris Dauphine) Equipe MOKAPLAN (INRIA) Joint work with Alden Waters (Bernoulli Institute, Rijksuniversiteit Groningen) April 3,


  1. Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE (Universit´ e Paris Dauphine) ´ Equipe MOKAPLAN (INRIA) Joint work with Alden Waters (Bernoulli Institute, Rijksuniversiteit Groningen) April 3, 2019 Imaging and machine learning The mathematics of imaging semester Institut Henri Poincar´ e

  2. Introduction 2 / 28 Semidefinite programming minimize Trace ( CX ) such that A ( X ) = b , X � 0 . Here, ◮ X , the unknown, is an n × n matrix ; ◮ C is a fixed n × n matrix (cost matrix) ; ◮ A : Sym n → R m is linear ; ◮ b is a fixed vector in R m .

  3. Introduction 3 / 28 Motivations Various difficult problems can be “lifted” to SDPs, and solving these lifted SDPs may solve the original problems. Particularly important example : relaxation of MaxCut . minimize Trace ( CX ) such that diag ( X ) = 1 , X � 0 . Relaxes the Maximum Cut problem from graph theory. [Delorme and Poljak, 1993] Appears also in phase retrieval, Z 2 synchronization ...

  4. Introduction 4 / 28 Numerical solvers SDPs can be solved at a given precision in polynomial time. But the order of the polynomial may be large. Interior point solvers, for instance, have a per iteration complexity of O ( n 4 ) in full generality (when m and n are of the same order). First-order ones, applied to a smoothed problem, have a O ( n 3 ) complexity, but require more iterations. → Numerically, high dimensional SDPs are difficult to solve.

  5. Introduction 5 / 28 Exploiting the low rank To speed up these algorithms : exploit the structure of the problem. Here, the “structure” we consider is the fact that there exists a low-rank solution. ◮ There is always a solution with rank r opt at most �� � 2 m + 1 / 4 − 1 / 2 . [Pataki, 1998] ◮ In many situations, there is actually a solution with rank r opt = O (1).

  6. Introduction 6 / 28 Burer-Monteiro factorization We focus on one heuristic that takes advantage of the low rank : the Burer-Monteiro factorization. [Burer and Monteiro, 2003] If there is a solution with rank r opt , we can write X under the form X = VV T , with V an n × p matrix, and p ≥ r opt . → We optimize over V instead of optimizing over X .

  7. Introduction 7 / 28 minimize Trace ( CX ) for X ∈ R n × n such that A ( X ) = b , X � 0 . � minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Remark : The factorization rank p must be chosen. It can be different from r opt , the rank of the solution.

  8. Introduction 8 / 28 minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . We assume that { V ∈ R n × p , A ( VV T ) = b } is a “nice” manifold. → Riemannian optimization algorithms. Main advantage of the factorized formulation The number of variables is not O ( n 2 ) anymore, but O ( np ), with possibly p ≪ n . → Less computationally-demanding algorithms can be used.

  9. Introduction 9 / 28 minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Main drawback of the factorized formulation Contrarily to the SDP, this problem is non-convex. → Riemannian optimization algorithms may get stuck at a critical point instead of finding a global minimizer. This issue can arise or not, depending on the factorization rank p . ⇒ How to choose p ?

  10. Introduction 10 / 28 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees for p � 2 m . ◮ Why this gap ?

  11. Introduction 10 / 28 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees for p � 2 m . ◮ Why this gap ? 2. Optimal rank for the Burer-Monteiro formulation √ ◮ Up to a minor improvement, p ≈ 2 m is the optimal rank for which general guarantees can be derived. √ ◮ Consequently, when p � 2 m , Riemannian optimization algorithms cannot be certified correct without assumptions on C .

  12. Introduction 10 / 28 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees for p � 2 m . ◮ Why this gap ? 2. Optimal rank for the Burer-Monteiro formulation √ ◮ Up to a minor improvement, p ≈ 2 m is the optimal rank for which general guarantees can be derived. √ ◮ Consequently, when p � 2 m , Riemannian optimization algorithms cannot be certified correct without assumptions on C . 3. Open questions

  13. Literature review 11 / 28 Empirical observations 1. [Burer and Monteiro, 2003] Numerical experiments on various problems, notably MaxCut and minimum bisection relaxations. √ The factorization rank is p ≈ 2 m , and algorithms always find a global minimizer. (The authors do not test smaller values of p .) 2. [Journ´ ee, Bach, Absil, and Sepulchre, 2010] Numerical experiments on MaxCut relaxations (with a particular initialization scheme). The algorithm proposed by the authors always finds a global minimizer when p = r opt .

  14. Literature review 12 / 28 Empirical observations (continued) 3. [Boumal, 2015] Numerical experiments on problems coming from orthogonal synchronization. Here, r opt = 3 and the algorithm finds the global minimizer as soon as p ≥ 5. 4. Similar results on “SDP-like” problems. See for example [Mishra, Meyer, Bonnabel, and Sepulchre, 2014].

  15. Literature review 13 / 28 Theoretical explanations in particular cases [Bandeira, Boumal, and Voroninski, 2016] SDP instances coming from Z 2 synchronization and community detection problems, under specific statistical assumptions. → With high probability, r opt = 1. → If p = 2, Riemannian algorithms find the global minimizer. Other particular SDP-like problems have been studied. → Under strong assumptions, p ≥ r opt is enough so that a → global minimizer is found. [Ge, Lee, and Ma, 2016] ...

  16. Literature review 14 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Main hypothesis (approximately) d ´ ef M p = { V ∈ R n × p , A ( VV T ) = b } is a manifold.

  17. Literature review 14 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Main hypothesis (approximately) d ´ ef M p = { V ∈ R n × p , A ( VV T ) = b } is a manifold. [More precisely : for all V ∈ M p , V ∈ R n × p → A ( V ˙ V T + ˙ φ V : ˙ V V T ) ∈ R m is surjective.]

  18. Literature review 15 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) , for V ∈ M p . Riemannian optimization algorithms typically converge to second-order critical points : A matrix V 0 ∈ M p is a second-order critical point if ◮ ∇ f C ( V 0 ) = 0 n , p ; ◮ Hess f C ( V 0 ) � 0, d ´ ef � V ∈ M p → Trace ( CVV T ) � where f C = .

  19. Literature review 16 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] Theorem Under suitable hypotheses, for almost all matrices C , if �� � 2 m + 1 4 − 1 p > , 2 all second-order critical points of the factorized problem are global minimizers. Consequently, Riemannian optimization algorithms always find a global minimizer.

  20. Literature review 16 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] Theorem Under suitable hypotheses, for almost all matrices C , if �� � 2 m + 1 4 − 1 p > , 2 all second-order critical points of the factorized problem are global minimizers. Consequently, Riemannian optimization algorithms always find a global minimizer. Remark : The value of p does not depend on r opt .

  21. Literature review 17 / 28 Summary ◮ In empirical experiments, as well as in the few particular cases that have been studied, algorithms seem to always work when p = O ( r opt ) . ◮ The only available general result guarantees that algorithms work when √ 2 m . p �

  22. Literature review 17 / 28 Summary ◮ In empirical experiments, as well as in the few particular cases that have been studied, algorithms seem to always work when p = O ( r opt ) . ◮ The only available general result guarantees that algorithms work when √ 2 m . p � √ As r opt is often much smaller than 2 m , this leaves a big gap. √ → Is it possible to obtain general guarantees for p ≪ 2 m ?

  23. Optimal rank for the Burer-Monteiro factorization 18 / 28 Overview of our results ◮ A minor improvement is possible over the result by [Boumal, Voroninski, and Bandeira, 2018], but it does not change the leading order term √ p � 2 m .

  24. Optimal rank for the Burer-Monteiro factorization 18 / 28 Overview of our results ◮ A minor improvement is possible over the result by [Boumal, Voroninski, and Bandeira, 2018], but it does not change the leading order term √ p � 2 m . ◮ With this improvement, the result is essentially optimal, √ even if r opt ≪ 2 m .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend