Rank optimality for the Burer-Monteiro factorization Ir` ene - PowerPoint PPT Presentation

Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE (Universit´ e Paris Dauphine) ´ Equipe MOKAPLAN (INRIA) Joint work with Alden Waters (Bernoulli Institute, Rijksuniversiteit Groningen) March 10, 2020 Workshop Optimization for machine learning CIRM, Luminy

Introduction 2 / 36 Semidefinite programming minimize Trace ( CX ) such that A ( X ) = b , X � 0 . Here, ◮ X , the unknown, is an n × n matrix ; ◮ C is a fixed n × n matrix (cost matrix) ; ◮ A : Sym n → R m is linear ; ◮ b is a fixed vector in R m .

Introduction 3 / 36 Motivations Various difficult problems can be “lifted” to SDPs, and solving these lifted SDPs may solve the original problems. Particularly important example : relaxation of MaxCut . minimize Trace ( CX ) such that diag ( X ) = 1 , X � 0 . Relaxes the Maximum Cut problem from graph theory. [Delorme and Poljak, 1993] Appears also in phase retrieval, Z 2 synchronization ...

Introduction 4 / 36 Numerical solvers General SDPs can be solved at arbitrary precision in polynomial time. But the order of the polynomial is large. Interior point solvers, for instance, have a per iteration complexity of O ( n 4 ) in full generality (when m and n are of the same order). First-order ones, applied to a smoothed problem, have a O ( n 3 ) complexity, but require more iterations. → Numerically, high dimensional SDPs are difficult to solve.

Introduction 5 / 36 Exploiting the low rank To speed up these algorithms : assume that there exists a low-rank solution and exploit this fact. ◮ [Pataki, 1998] : There is always a solution with rank √ �� r opt ≤ 2 m + 1 / 4 − 1 / 2 ≈ 2 m . ◮ In many situations, there is actually a solution with rank r opt = O (1) .

Introduction 6 / 36 Exploiting the low rank Two main strategies : ◮ Frank-Wolfe methods ; [Frank and Wolfe, 1956] ◮ Burer-Monteiro factorization. [Burer and Monteiro, 2003]

Introduction 7 / 36 Burer-Monteiro factorization ◮ Assume that there is a solution with rank r opt . ◮ Choose some integer p ≥ r opt . ◮ Write X under the form X = VV T , with V an n × p matrix. ◮ Minimize Trace ( CVV T ) over V .

Introduction 8 / 36 minimize Trace ( CX ) for X ∈ R n × n such that A ( X ) = b , X � 0 . � minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Remark : p is the factorization rank . It must be chosen, and can be equal to or larger than r opt .

Introduction 9 / 36 minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . We assume that { V ∈ R n × p , A ( VV T ) = b } is a “nice” manifold. → Riemannian optimization algorithms. Main advantage of the factorized formulation The number of variables is not O ( n 2 ) anymore, but O ( np ), with possibly p ≪ n . → Riemannian algorithms can be much faster than SDP → solvers.

Introduction 10 / 36 minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Main drawback of the factorized formulation Contrarily to the SDP, this problem is non-convex. → Riemannian optimization algorithms may get stuck at a critical point instead of finding a global minimizer. This issue can arise or not, depending on the factorization rank p . ⇒ How to choose p ?

Introduction 11 / 36 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees unless p � 2 m . √ ◮ But r opt ≪ 2 m . Why this gap ?

Introduction 11 / 36 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees unless p � 2 m . √ ◮ But r opt ≪ 2 m . Why this gap ? 2. Optimal rank for the Burer-Monteiro formulation ◮ A minor improvement is possible over previous general guarantees. ◮ The improved result is optimal. √ → If p � 2 m , Riemannian algorithms cannot be certified correct without assumptions on C . ◮ Idea of proof.

Introduction 11 / 36 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees unless p � 2 m . √ ◮ But r opt ≪ 2 m . Why this gap ? 2. Optimal rank for the Burer-Monteiro formulation ◮ A minor improvement is possible over previous general guarantees. ◮ The improved result is optimal. √ → If p � 2 m , Riemannian algorithms cannot be certified correct without assumptions on C . ◮ Idea of proof. 3. Open questions

Literature review 12 / 36 Empirical observations 1. [Burer and Monteiro, 2003] Numerical experiments on various problems, notably MaxCut and minimum bisection relaxations. √ The factorization rank is p ≈ 2 m ; Riemannian algorithms always find a global minimizer. (The authors do not test smaller values of p .) 2. [Journ´ ee, Bach, Absil, and Sepulchre, 2010] Numerical experiments on MaxCut relaxations (with a particular initialization scheme). The algorithm proposed by the authors always finds a global minimizer when p = r opt .

Literature review 13 / 36 Empirical observations (continued) 3. [Boumal, 2015] Numerical experiments on problems coming from orthogonal synchronization. Here, r opt = 3 and the algorithm finds the global minimizer as soon as p ≥ 5. 4. Similar results on “SDP-like” problems. See for example [Mishra, Meyer, Bonnabel, and Sepulchre, 2014].

Literature review 14 / 36 Theoretical explanations in particular cases [Bandeira, Boumal, and Voroninski, 2016] SDP instances coming from Z 2 synchronization and community detection problems, under specific statistical assumptions. → With high probability, r opt = 1. → If p = 2, Riemannian algorithms find the global minimizer. Other particular SDP-like problems have been studied. → Under strong assumptions, as soon as p ≥ r opt , a → global minimizer is found. [Ge, Lee, and Ma, 2016] ... Strong guarantees, but in very specific situations only.

Literature review 15 / 36 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . The only assumption is (approximately) that d ´ ef = { V ∈ R n × p , A ( VV T ) = b } M p is a manifold.

Literature review 16 / 36 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) , for V ∈ M p . Riemannian optimization algorithms typically converge to second-order critical points : A matrix V 0 ∈ M p is a second-order critical point if ◮ ∇ f C ( V 0 ) = 0 n , p ; ◮ Hess f C ( V 0 ) � 0, d ´ ef � V ∈ M p → Trace ( CVV T ) � where f C = .

Literature review 17 / 36 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] Theorem For almost all matrices C , if �� 2 m + 1 4 − 1 p > , 2 all second-order critical points are global minimizers. Consequently, Riemannian optimization algorithms always find a global minimizer.

Literature review 17 / 36 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] Theorem For almost all matrices C , if �� 2 m + 1 4 − 1 p > , 2 all second-order critical points are global minimizers. Consequently, Riemannian optimization algorithms always find a global minimizer. Remark : The value of p does not depend on r opt .

Literature review 18 / 36 Summary ◮ In empirical experiments, as well as in the few particular cases that have been studied, algorithms seem to always work when p = O ( r opt ) . ◮ The only available general result guarantees that algorithms work when √ 2 m . p �

Literature review 18 / 36 Summary ◮ In empirical experiments, as well as in the few particular cases that have been studied, algorithms seem to always work when p = O ( r opt ) . ◮ The only available general result guarantees that algorithms work when √ 2 m . p � √ As r opt is often much smaller than 2 m , this leaves a big gap. √ → Is it possible to obtain general guarantees for p ≪ 2 m ?

Optimal rank for the Burer-Monteiro factorization 19 / 36 Overview of our results ◮ A minor improvement is possible over the result by [Boumal, Voroninski, and Bandeira, 2018], but it does not change the leading order term √ p � 2 m .

Optimal rank for the Burer-Monteiro factorization 19 / 36 Overview of our results ◮ A minor improvement is possible over the result by [Boumal, Voroninski, and Bandeira, 2018], but it does not change the leading order term √ p � 2 m . ◮ With this improvement, the result is essentially optimal, √ even if r opt ≪ 2 m .

Optimal rank for the Burer-Monteiro factorization 20 / 36 Improving [Boumal, Voroninski, and Bandeira, 2018] Theorem For almost all matrices C , if �� 2 m + 9 4 − 3 p > , 2 all second-order critical points of the factorized problem are global minimizers. In [Boumal, Voroninski, and Bandeira, 2018], we had �� 2 m + 1 4 − 1 . Our result is better by one unit for most 2 values of m .

Rank optimality for the Burer-Monteiro factorization Ir` ene - PowerPoint PPT Presentation

Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE (Universit e Paris Dauphine) Equipe MOKAPLAN (INRIA) Joint work with Alden Waters (Bernoulli Institute, Rijksuniversiteit Groningen) March 10,

Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Multiple-Rank Updates to Matrix Factorizations Zack 8/30/2013 Outline u Introduction u

STRUCTURED LOW-RANK MATRIX FACTORIZATION: GLOBAL OPTIMALITY, ALGORITHMS, AND APPLICATIONS ARTICLE

Optimality Conditions Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Optimality

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

A Model For Mixed Linear-Tropical Matrix Factorization James Hook, Sanjar Karaev, Pauli Miettinen

Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Low-Rank Inducing Norms with Optimality Interpretations LU Christian Grussler 2017 Pontus

Quinta Monteiro de Matos is a family fine winemaking company, located at Tejo Region, 50km

Machine Learning and Social Participation Hello! We are Marlia Monteiro and Yasodara

The Double Copy of a Point Charge Ricardo Monteiro Queen Mary University of London QCD meets

Stability, Networks: Stability, Networks: Control, and Optimality Control, and Optimality

Lecture 2: Principal Components and Eigenfaces Mark Hasegawa-Johnson ECE 417: Multimedia Signal

Principal component analysis Course of Machine Learning Master Degree in Computer Science

TLS Renegotiation Vulnerability IETF-76 Joe Salowey (jsalowey@cisco.com) Eric Rescorla

Mueller Navelet jets at LHC: An observable to reveal high energy resummation eets?

A review of Hybrid High-Order methods: formulations, computational aspects, links with other

Conditionals in Translation Towards Translation Mining in a compositional setting Jos Tellings

Novel measurements of anomalous triple gauge couplings for the LHC Elena Venturini SISSA and

Modelling and Simulation of Mechatronic Systems 02PCYQW Examples Matrix Calculus Basilio Bona