rank optimality for the burer monteiro factorization
play

Rank optimality for the Burer-Monteiro factorization Ir` ene - PowerPoint PPT Presentation

Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE (Universit e Paris Dauphine) Equipe MOKAPLAN (INRIA) Joint work with Alden Waters (Bernoulli Institute, Rijksuniversiteit Groningen) March 10,


  1. Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE (Universit´ e Paris Dauphine) ´ Equipe MOKAPLAN (INRIA) Joint work with Alden Waters (Bernoulli Institute, Rijksuniversiteit Groningen) March 10, 2020 Workshop Optimization for machine learning CIRM, Luminy

  2. Introduction 2 / 36 Semidefinite programming minimize Trace ( CX ) such that A ( X ) = b , X � 0 . Here, ◮ X , the unknown, is an n × n matrix ; ◮ C is a fixed n × n matrix (cost matrix) ; ◮ A : Sym n → R m is linear ; ◮ b is a fixed vector in R m .

  3. Introduction 3 / 36 Motivations Various difficult problems can be “lifted” to SDPs, and solving these lifted SDPs may solve the original problems. Particularly important example : relaxation of MaxCut . minimize Trace ( CX ) such that diag ( X ) = 1 , X � 0 . Relaxes the Maximum Cut problem from graph theory. [Delorme and Poljak, 1993] Appears also in phase retrieval, Z 2 synchronization ...

  4. Introduction 4 / 36 Numerical solvers General SDPs can be solved at arbitrary precision in polynomial time. But the order of the polynomial is large. Interior point solvers, for instance, have a per iteration complexity of O ( n 4 ) in full generality (when m and n are of the same order). First-order ones, applied to a smoothed problem, have a O ( n 3 ) complexity, but require more iterations. → Numerically, high dimensional SDPs are difficult to solve.

  5. Introduction 5 / 36 Exploiting the low rank To speed up these algorithms : assume that there exists a low-rank solution and exploit this fact. ◮ [Pataki, 1998] : There is always a solution with rank √ �� � r opt ≤ 2 m + 1 / 4 − 1 / 2 ≈ 2 m . ◮ In many situations, there is actually a solution with rank r opt = O (1) .

  6. Introduction 6 / 36 Exploiting the low rank Two main strategies : ◮ Frank-Wolfe methods ; [Frank and Wolfe, 1956] ◮ Burer-Monteiro factorization. [Burer and Monteiro, 2003]

  7. Introduction 7 / 36 Burer-Monteiro factorization ◮ Assume that there is a solution with rank r opt . ◮ Choose some integer p ≥ r opt . ◮ Write X under the form X = VV T , with V an n × p matrix. ◮ Minimize Trace ( CVV T ) over V .

  8. Introduction 8 / 36 minimize Trace ( CX ) for X ∈ R n × n such that A ( X ) = b , X � 0 . � minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Remark : p is the factorization rank . It must be chosen, and can be equal to or larger than r opt .

  9. Introduction 9 / 36 minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . We assume that { V ∈ R n × p , A ( VV T ) = b } is a “nice” manifold. → Riemannian optimization algorithms. Main advantage of the factorized formulation The number of variables is not O ( n 2 ) anymore, but O ( np ), with possibly p ≪ n . → Riemannian algorithms can be much faster than SDP → solvers.

  10. Introduction 10 / 36 minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Main drawback of the factorized formulation Contrarily to the SDP, this problem is non-convex. → Riemannian optimization algorithms may get stuck at a critical point instead of finding a global minimizer. This issue can arise or not, depending on the factorization rank p . ⇒ How to choose p ?

  11. Introduction 11 / 36 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees unless p � 2 m . √ ◮ But r opt ≪ 2 m . Why this gap ?

  12. Introduction 11 / 36 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees unless p � 2 m . √ ◮ But r opt ≪ 2 m . Why this gap ? 2. Optimal rank for the Burer-Monteiro formulation ◮ A minor improvement is possible over previous general guarantees. ◮ The improved result is optimal. √ → If p � 2 m , Riemannian algorithms cannot be certified correct without assumptions on C . ◮ Idea of proof.

  13. Introduction 11 / 36 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees unless p � 2 m . √ ◮ But r opt ≪ 2 m . Why this gap ? 2. Optimal rank for the Burer-Monteiro formulation ◮ A minor improvement is possible over previous general guarantees. ◮ The improved result is optimal. √ → If p � 2 m , Riemannian algorithms cannot be certified correct without assumptions on C . ◮ Idea of proof. 3. Open questions

  14. Literature review 12 / 36 Empirical observations 1. [Burer and Monteiro, 2003] Numerical experiments on various problems, notably MaxCut and minimum bisection relaxations. √ The factorization rank is p ≈ 2 m ; Riemannian algorithms always find a global minimizer. (The authors do not test smaller values of p .) 2. [Journ´ ee, Bach, Absil, and Sepulchre, 2010] Numerical experiments on MaxCut relaxations (with a particular initialization scheme). The algorithm proposed by the authors always finds a global minimizer when p = r opt .

  15. Literature review 13 / 36 Empirical observations (continued) 3. [Boumal, 2015] Numerical experiments on problems coming from orthogonal synchronization. Here, r opt = 3 and the algorithm finds the global minimizer as soon as p ≥ 5. 4. Similar results on “SDP-like” problems. See for example [Mishra, Meyer, Bonnabel, and Sepulchre, 2014].

  16. Literature review 14 / 36 Theoretical explanations in particular cases [Bandeira, Boumal, and Voroninski, 2016] SDP instances coming from Z 2 synchronization and community detection problems, under specific statistical assumptions. → With high probability, r opt = 1. → If p = 2, Riemannian algorithms find the global minimizer. Other particular SDP-like problems have been studied. → Under strong assumptions, as soon as p ≥ r opt , a → global minimizer is found. [Ge, Lee, and Ma, 2016] ... Strong guarantees, but in very specific situations only.

  17. Literature review 15 / 36 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . The only assumption is (approximately) that d ´ ef = { V ∈ R n × p , A ( VV T ) = b } M p is a manifold.

  18. Literature review 16 / 36 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) , for V ∈ M p . Riemannian optimization algorithms typically converge to second-order critical points : A matrix V 0 ∈ M p is a second-order critical point if ◮ ∇ f C ( V 0 ) = 0 n , p ; ◮ Hess f C ( V 0 ) � 0, d ´ ef � V ∈ M p → Trace ( CVV T ) � where f C = .

  19. Literature review 17 / 36 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] Theorem For almost all matrices C , if �� � 2 m + 1 4 − 1 p > , 2 all second-order critical points are global minimizers. Consequently, Riemannian optimization algorithms always find a global minimizer.

  20. Literature review 17 / 36 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] Theorem For almost all matrices C , if �� � 2 m + 1 4 − 1 p > , 2 all second-order critical points are global minimizers. Consequently, Riemannian optimization algorithms always find a global minimizer. Remark : The value of p does not depend on r opt .

  21. Literature review 18 / 36 Summary ◮ In empirical experiments, as well as in the few particular cases that have been studied, algorithms seem to always work when p = O ( r opt ) . ◮ The only available general result guarantees that algorithms work when √ 2 m . p �

  22. Literature review 18 / 36 Summary ◮ In empirical experiments, as well as in the few particular cases that have been studied, algorithms seem to always work when p = O ( r opt ) . ◮ The only available general result guarantees that algorithms work when √ 2 m . p � √ As r opt is often much smaller than 2 m , this leaves a big gap. √ → Is it possible to obtain general guarantees for p ≪ 2 m ?

  23. Optimal rank for the Burer-Monteiro factorization 19 / 36 Overview of our results ◮ A minor improvement is possible over the result by [Boumal, Voroninski, and Bandeira, 2018], but it does not change the leading order term √ p � 2 m .

  24. Optimal rank for the Burer-Monteiro factorization 19 / 36 Overview of our results ◮ A minor improvement is possible over the result by [Boumal, Voroninski, and Bandeira, 2018], but it does not change the leading order term √ p � 2 m . ◮ With this improvement, the result is essentially optimal, √ even if r opt ≪ 2 m .

  25. Optimal rank for the Burer-Monteiro factorization 20 / 36 Improving [Boumal, Voroninski, and Bandeira, 2018] Theorem For almost all matrices C , if �� � 2 m + 9 4 − 3 p > , 2 all second-order critical points of the factorized problem are global minimizers. In [Boumal, Voroninski, and Bandeira, 2018], we had �� � 2 m + 1 4 − 1 . Our result is better by one unit for most 2 values of m .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend