Rank optimality for the Burer-Monteiro factorization Ir` ene - PowerPoint PPT Presentation

Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE (Universit´ e Paris Dauphine) ´ Equipe MOKAPLAN (INRIA) Joint work with Alden Waters (Bernoulli Institute, Rijksuniversiteit Groningen) April 3, 2019 Imaging and machine learning The mathematics of imaging semester Institut Henri Poincar´ e

Introduction 2 / 28 Semidefinite programming minimize Trace ( CX ) such that A ( X ) = b , X � 0 . Here, ◮ X , the unknown, is an n × n matrix ; ◮ C is a fixed n × n matrix (cost matrix) ; ◮ A : Sym n → R m is linear ; ◮ b is a fixed vector in R m .

Introduction 3 / 28 Motivations Various difficult problems can be “lifted” to SDPs, and solving these lifted SDPs may solve the original problems. Particularly important example : relaxation of MaxCut . minimize Trace ( CX ) such that diag ( X ) = 1 , X � 0 . Relaxes the Maximum Cut problem from graph theory. [Delorme and Poljak, 1993] Appears also in phase retrieval, Z 2 synchronization ...

Introduction 4 / 28 Numerical solvers SDPs can be solved at a given precision in polynomial time. But the order of the polynomial may be large. Interior point solvers, for instance, have a per iteration complexity of O ( n 4 ) in full generality (when m and n are of the same order). First-order ones, applied to a smoothed problem, have a O ( n 3 ) complexity, but require more iterations. → Numerically, high dimensional SDPs are difficult to solve.

Introduction 5 / 28 Exploiting the low rank To speed up these algorithms : exploit the structure of the problem. Here, the “structure” we consider is the fact that there exists a low-rank solution. ◮ There is always a solution with rank r opt at most �� 2 m + 1 / 4 − 1 / 2 . [Pataki, 1998] ◮ In many situations, there is actually a solution with rank r opt = O (1).

Introduction 6 / 28 Burer-Monteiro factorization We focus on one heuristic that takes advantage of the low rank : the Burer-Monteiro factorization. [Burer and Monteiro, 2003] If there is a solution with rank r opt , we can write X under the form X = VV T , with V an n × p matrix, and p ≥ r opt . → We optimize over V instead of optimizing over X .

Introduction 7 / 28 minimize Trace ( CX ) for X ∈ R n × n such that A ( X ) = b , X � 0 . � minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Remark : The factorization rank p must be chosen. It can be different from r opt , the rank of the solution.

Introduction 8 / 28 minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . We assume that { V ∈ R n × p , A ( VV T ) = b } is a “nice” manifold. → Riemannian optimization algorithms. Main advantage of the factorized formulation The number of variables is not O ( n 2 ) anymore, but O ( np ), with possibly p ≪ n . → Less computationally-demanding algorithms can be used.

Introduction 9 / 28 minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Main drawback of the factorized formulation Contrarily to the SDP, this problem is non-convex. → Riemannian optimization algorithms may get stuck at a critical point instead of finding a global minimizer. This issue can arise or not, depending on the factorization rank p . ⇒ How to choose p ?

Introduction 10 / 28 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees for p � 2 m . ◮ Why this gap ?

Introduction 10 / 28 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees for p � 2 m . ◮ Why this gap ? 2. Optimal rank for the Burer-Monteiro formulation √ ◮ Up to a minor improvement, p ≈ 2 m is the optimal rank for which general guarantees can be derived. √ ◮ Consequently, when p � 2 m , Riemannian optimization algorithms cannot be certified correct without assumptions on C .

Introduction 10 / 28 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees for p � 2 m . ◮ Why this gap ? 2. Optimal rank for the Burer-Monteiro formulation √ ◮ Up to a minor improvement, p ≈ 2 m is the optimal rank for which general guarantees can be derived. √ ◮ Consequently, when p � 2 m , Riemannian optimization algorithms cannot be certified correct without assumptions on C . 3. Open questions

Literature review 11 / 28 Empirical observations 1. [Burer and Monteiro, 2003] Numerical experiments on various problems, notably MaxCut and minimum bisection relaxations. √ The factorization rank is p ≈ 2 m , and algorithms always find a global minimizer. (The authors do not test smaller values of p .) 2. [Journ´ ee, Bach, Absil, and Sepulchre, 2010] Numerical experiments on MaxCut relaxations (with a particular initialization scheme). The algorithm proposed by the authors always finds a global minimizer when p = r opt .

Literature review 12 / 28 Empirical observations (continued) 3. [Boumal, 2015] Numerical experiments on problems coming from orthogonal synchronization. Here, r opt = 3 and the algorithm finds the global minimizer as soon as p ≥ 5. 4. Similar results on “SDP-like” problems. See for example [Mishra, Meyer, Bonnabel, and Sepulchre, 2014].

Literature review 13 / 28 Theoretical explanations in particular cases [Bandeira, Boumal, and Voroninski, 2016] SDP instances coming from Z 2 synchronization and community detection problems, under specific statistical assumptions. → With high probability, r opt = 1. → If p = 2, Riemannian algorithms find the global minimizer. Other particular SDP-like problems have been studied. → Under strong assumptions, p ≥ r opt is enough so that a → global minimizer is found. [Ge, Lee, and Ma, 2016] ...

Literature review 14 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Main hypothesis (approximately) d ´ ef M p = { V ∈ R n × p , A ( VV T ) = b } is a manifold.

Literature review 14 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Main hypothesis (approximately) d ´ ef M p = { V ∈ R n × p , A ( VV T ) = b } is a manifold. [More precisely : for all V ∈ M p , V ∈ R n × p → A ( V ˙ V T + ˙ φ V : ˙ V V T ) ∈ R m is surjective.]

Literature review 15 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) , for V ∈ M p . Riemannian optimization algorithms typically converge to second-order critical points : A matrix V 0 ∈ M p is a second-order critical point if ◮ ∇ f C ( V 0 ) = 0 n , p ; ◮ Hess f C ( V 0 ) � 0, d ´ ef � V ∈ M p → Trace ( CVV T ) � where f C = .

Literature review 16 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] Theorem Under suitable hypotheses, for almost all matrices C , if �� 2 m + 1 4 − 1 p > , 2 all second-order critical points of the factorized problem are global minimizers. Consequently, Riemannian optimization algorithms always find a global minimizer.

Literature review 16 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] Theorem Under suitable hypotheses, for almost all matrices C , if �� 2 m + 1 4 − 1 p > , 2 all second-order critical points of the factorized problem are global minimizers. Consequently, Riemannian optimization algorithms always find a global minimizer. Remark : The value of p does not depend on r opt .

Literature review 17 / 28 Summary ◮ In empirical experiments, as well as in the few particular cases that have been studied, algorithms seem to always work when p = O ( r opt ) . ◮ The only available general result guarantees that algorithms work when √ 2 m . p �

Literature review 17 / 28 Summary ◮ In empirical experiments, as well as in the few particular cases that have been studied, algorithms seem to always work when p = O ( r opt ) . ◮ The only available general result guarantees that algorithms work when √ 2 m . p � √ As r opt is often much smaller than 2 m , this leaves a big gap. √ → Is it possible to obtain general guarantees for p ≪ 2 m ?

Optimal rank for the Burer-Monteiro factorization 18 / 28 Overview of our results ◮ A minor improvement is possible over the result by [Boumal, Voroninski, and Bandeira, 2018], but it does not change the leading order term √ p � 2 m .

Optimal rank for the Burer-Monteiro factorization 18 / 28 Overview of our results ◮ A minor improvement is possible over the result by [Boumal, Voroninski, and Bandeira, 2018], but it does not change the leading order term √ p � 2 m . ◮ With this improvement, the result is essentially optimal, √ even if r opt ≪ 2 m .

Rank optimality for the Burer-Monteiro factorization Ir` ene - PowerPoint PPT Presentation

Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE (Universit e Paris Dauphine) Equipe MOKAPLAN (INRIA) Joint work with Alden Waters (Bernoulli Institute, Rijksuniversiteit Groningen) April 3,

Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Multiple-Rank Updates to Matrix Factorizations Zack 8/30/2013 Outline u Introduction u

STRUCTURED LOW-RANK MATRIX FACTORIZATION: GLOBAL OPTIMALITY, ALGORITHMS, AND APPLICATIONS ARTICLE

Optimality Conditions Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Optimality

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

A Model For Mixed Linear-Tropical Matrix Factorization James Hook, Sanjar Karaev, Pauli Miettinen

Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Low-Rank Inducing Norms with Optimality Interpretations LU Christian Grussler 2017 Pontus

Quinta Monteiro de Matos is a family fine winemaking company, located at Tejo Region, 50km

Machine Learning and Social Participation Hello! We are Marlia Monteiro and Yasodara

The Double Copy of a Point Charge Ricardo Monteiro Queen Mary University of London QCD meets

Stability, Networks: Stability, Networks: Control, and Optimality Control, and Optimality

Post-gradual education at the SUT Bratislava Univ. Prof. habil. Alojz KOP IK, PhD.

TREC 2006 Video Retrieval Evaluation Introductions Paul Over* Wessel Kraaij (TNO ICT) Tzveta

4 May 10, 2011 1/36 Outline Koala Architecture Job Model System Components

Model Order Reduction for Wave Equations Rob F. Remis and J orn T. Zimmerling DCSE Fall

Numerical semigroups associated to algebraic curves A. Araujo 1 O. Neto 2 1 CMAF/UA 2 CMAF/FCUL

Explicit Expanding Expanders as Datacenter Topologies Michael Dinitz Johns Hopkins University

A variation of gluing of numerical semigroups Takahiro Numata Nihon University 9th September

Scientific Research on Yoga as a Contemplative Practice Yoga Alliance Webinar June 23, 2020 Sat