Noisy matrix completion: Understanding statistical guarantees for - PowerPoint PPT Presentation

Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization Cong Ma ORFE, Princeton University

Yuejie Chi Jianqing Fan Yuxin Chen Yuling Yan CMU ECE Princeton ORFE Princeton EE Princeton ORFE

Convex relaxation for low-rank structure minimize � Z � ∗ Z subject to noiseless data constraints low-rank matrix semidefinite relaxation figure credit: Piet Mondrian 3/ 39

Convex relaxation for low-rank structure minimize � Z � ∗ Z subject to noiseless data constraints � matrix sensing (Recht, Fazel, Parrilo ’07) � phase retrieval (Cand` es, Strohmer, Voroninski ’11, Cand` es, Li ’12) � matrix completion (Cand` es, Recht ’08, Cand` es, Tao ’08, Gross ’09) � robust PCA (Chandrasekaran et al. ’09, Cand` es et al. ’09) � Hankel matrix completion (Fazel et al. ’13, Chen, Chi ’13, Cai et al. ’15) � blind deconvolution (Ahmed, Recht, Romberg ’12, Ling, Strohmer ’15) � joint alignment / matching (Chen, Huang, Guibas ’14) . . . 3/ 39

Stability of convex relaxation against noise minimize � Z � ∗ Z subject to noisy data constraints low-rank matrix semidefinite relaxation figure credit: Piet Mondrian 4/ 39

Stability of convex relaxation against noise minimize f ( Z ; noisy data ) + λ � Z � ∗ � �� Z empirical loss low-rank matrix semidefinite relaxation figure credit: Piet Mondrian 4/ 39

Stability of convex relaxation against noise minimize f ( Z ; noisy data ) + λ � Z � ∗ � �� Z empirical loss � matrix sensing (RIP measurements) (Cand` es, Plan ’10) � phase retrieval (Gaussian measurements) (Cand` es et al. ’11) ? matrix completion (Cand` es, Plan ’09, Negahban, Wainwright ’10, Koltchinskii et al. ’10) ? robust PCA (Zhou, Li, Wright, Cand` es, Ma ’10) ? Hankel matrix completion (Chen, Chi ’13) ? blind deconvolution (Ahmed, Recht, Romberg ’12, Ling, Strohmer ’15) ? joint alignment / matching . . . 4/ 39

Stability of convex relaxation against noise minimize f ( Z ; noisy data ) + λ � Z � ∗ � �� Z empirical loss � matrix sensing (RIP measurements) (Cand` es, Plan ’10) � phase retrieval (Gaussian measurements) (Cand` es et al. ’11) ? this talk: matrix completion (Cand` es, Plan ’09, Negahban, Wainwright ’10, Koltchinskii et al. ’10) ? robust PCA (Zhou, Li, Wright, Cand` es, Ma ’10) ? Hankel matrix completion (Chen, Chi ’13) ? blind deconvolution (Ahmed, Recht, Romberg ’12, Ling, Strohmer ’15) ? joint alignment / matching . . . 4/ 39

Low-rank matrix completion   ? ? ? ? � �   ? ? ? ? ? ? � � ? ?       � ? ? � ? ? ? ? ? ?     ? ? � ? ? �     ? ? ? ? � ? ? ? ? ?     ? � ? ? � ?   ? ? ? ? ? � � ? ? figure credit: E. J. Cand` es Given partial samples of a low-rank matrix M ⋆ , fill in missing entries 5/ 39

Noisy low-rank matrix completion M i,j = M ⋆ observations: i,j + noise , ( i, j ) ∈ Ω estimate M ⋆ goal:   � ? ? ? � ?   ? ? � � ? ?      ? ? ? ?  � �     ? ? � ? ? �     ? ? ? ? ? �     ? � ? ? � ?   ? ? � � ? ? unknown rank- r matrix M ⋆ ∈ R n × n sampling set Ω 6/ 39

Noisy low-rank matrix completion M i,j = M ⋆ observations: i,j + noise , ( i, j ) ∈ Ω estimate M ⋆ goal: convex relaxation: � � Z i,j − M i,j � 2 minimize + λ � Z � ∗ Z ∈ R n × n ( i,j ) ∈ Ω � �� squared loss 6/ 39

Prior statistical guarantees for convex relaxation • random sampling: each ( i, j ) ∈ Ω with prob. p • random noise: i.i.d. sub-Gaussian noise with variance σ 2 • true matrix M ⋆ ∈ R n × n : rank r = O (1) , incoherent, . . . 7/ 39

. M ≠ M ı . . „ . ror F Cand` . ion ‡ : noise standard dev. „ es, Plan ’09 . . m i n i m a C x a n l i d m e ` e g i s t a , h P b l a a n n , ’ W 0 9 a i n w r i g h t σn 1 . 5

. M ≠ M ı . . „ . ror F Cand` minimax limit . ion ‡ : noise standard dev. „ es, Plan ’09 . . m i n i m a C x a n l i d m e ` e g i s t a , h P b l a a n n , ’ W 0 9 a i n w r i g h Î M ı Î ∞ t minimax limit . „ . . . σ σn 1 . 5 � n/p

� minimax limit σ n/p σn 1 . 5 Cand` es, Plan ’09 � max { σ, � M ⋆ � ∞ } Negahban, Wainwright ’10 n/p . . t 9 . . „ i t m 0 h ’ i g l n „ tion er minimax limit i r x a w m a l . P m n es, Plan ’09 i i , a Negahban, Wainwright ’10 n s W e ` i Koltchinskii, Tsybakov, Lounici ’10 m Cand` d F n , M ≠ M ı . . n a a C b h a g e . . . „ . „ . . ror minimax limit Î M ı Î ∞ ion ‡ : noise standard dev.

� minimax limit σ n/p σn 1 . 5 Cand` es, Plan ’09 � max { σ, � M ⋆ � ∞ } Negahban, Wainwright ’10 n/p � max { σ, � M ⋆ � ∞ } Koltchinskii, Tsybakov, Lounici ’10 n/p . . t 9 . . „ i t m 0 h ’ i g l n „ tion er minimax limit i r x a w m a a l m 9 . P i 0 m n n es, Plan ’09 ’ i n m 0 i a m l 1 i , a Negahban, Wainwright ’10 P ’ n s t W , h s 0 e ` e ` g i i 1 Koltchinskii, Tsybakov, Lounici ’10 d r ’ m Cand` n w d i F a n c C i n a i n , u M ≠ M ı . . W n o a L , a n , C a v o b b h k a a g b h y e N s T a , g i k i s e n h i . . c t l o . „ . „ K . . ror minimax limit Î M ı Î ∞ ion ‡ : noise standard dev.

1.2 recovery error using SDP 1.68*(oracle error) 1.1 1.68*[(2nr − r 2 )/(pn 2 )] 1/2 1 0.9 convex relaxation 0.8 rms error 0.7 on 1.68 × oracle bound 0.6 0.5 0.4 0.3 0.2 100 200 300 400 500 600 700 800 900 1000 n Existing theory for convex relaxation does not match practice . . .

k − k ≈ with adversarial noise. Consequently, our analysis looses a p n factor vis a vis an optimal bound that is achievable . via the help of an oracle. (III.9) The diligent reader may argue that the least-squares Existing theory for convex relaxation does not match practice . . .

What are the roadblocks? Strategy: � M cvx is optimizer if there exists W s.t. � �� dual certificate ( � M cvx , W ) obeys KKT optimality condition 10/ 39

What are the roadblocks? Strategy: � M cvx is optimizer if there exists W s.t. � �� dual certificate ( � M cvx , W ) obeys KKT optimality condition David Gross • noiseless case : � M cvx ← M ⋆ ; W ← golfing scheme � �� exact recovery 10/ 39

What are the roadblocks? Strategy: � M cvx is optimizer if there exists W s.t. � �� dual certificate ( � M cvx , W ) obeys KKT optimality condition David Gross • noiseless case : � M cvx ← M ⋆ ; W ← golfing scheme � �� exact recovery • noisy case : � M cvx is very complicated; hard to construct W . . . 10/ 39

dual certification (golfing scheme)

dual certification (golfing scheme) nonconvex optimization

A detour: nonconvex optimization Burer–Monteiro: represent Z by XY ⊤ with X , Y ∈ R n × r � �� low-rank factors XY € with with X , ¸ 12/ 39

A detour: nonconvex optimization Burer–Monteiro: represent Z by XY ⊤ with X , Y ∈ R n × r � �� low-rank factors XY € with with X , ¸ nonconvex approach: �� XY ⊤ � � 2 � X , Y ∈ R n × r f ( X , Y ) = minimize i,j − M i,j + reg ( X , Y ) ( i,j ) ∈ Ω � �� squared loss 12/ 39

A detour: nonconvex optimization • Burer, Monteiro ’03 • Rennie, Srebro ’05 • Keshavan, Montanari, Oh ’09 ’10 • Jain, Netrapalli, Sanghavi ’12 • Hardt ’13 • Sun, Luo ’14 • Chen, Wainwright ’15 • Tu, Boczar, Simchowitz, Soltanolkotabi, Recht ’15 • Zhao, Wang, Liu ’15 • Zheng, Lafferty ’16 • Yi, Park, Chen, Caramanis ’16 • Ge, Lee, Ma ’16 • Ge, Jin, Zheng ’17 • Ma, Wang, Chi, Chen ’17 • Chen, Li ’18 • Chen, Liu, Li ’19 • ... 13/ 39

A detour: nonconvex optimization �� XY ⊤ � � 2 � X , Y ∈ R n × r f ( X , Y ) = minimize i,j − M i,j + reg ( X , Y ) ( i,j ) ∈ Ω • suitable initialization: ( X 0 , Y 0 ) • gradient descent: for t = 0 , 1 , . . . X t +1 = X t − η t ∇ X f ( X t , Y t ) Y t +1 = Y t − η t ∇ Y f ( X t , Y t ) 14/ 39

A detour: nonconvex optimization • random sampling: each ( i, j ) ∈ Ω with prob. p • random noise: i.i.d. sub-Gaussian noise with variance σ 2 • true matrix M ⋆ ∈ R n × n : r = O (1) , incoherent, . . . 15/ 39

� minimax limit σ n/p � nonconvex algorithms σ n/p (optimal!) . . . „ r o t i r m r i l x a s F m m h M ≠ M ı . . i n t i i m r o g l a x e v n o c n o . . n . „ . „ . . ror minimax limit ion ‡ : noise standard dev.

Noisy matrix completion: Understanding statistical guarantees for - PowerPoint PPT Presentation

Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization Cong Ma ORFE, Princeton University Yuejie Chi Jianqing Fan Yuxin Chen Yuling Yan CMU ECE Princeton ORFE Princeton EE Princeton

Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Lecture 15: Exact Tensor Completion Joint Work with David Steurer Lecture Outline Part I:

Singularity Degree of PSD Matrix Completion Shin-ichi Tanigawa CWI and Kyoto July 29, 2016 1 /

ELD Completion Module Advice for students on completion of Modules A, B & C Why?

The Parameterized Complexity of Matrix Completion Robert Ganian Joint work with: Eduard Eiben

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Implement Distributed Alternating Least Squares Algorithm for Matrix Completion Varun Gandhi

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

Squared distance matrix of a tree R.B.Bapat Indian Statistical Institute New Delhi, India

Discriminative Training February 19, 2013 Tuesday, February 19, 13 Noisy Channels Again p ( e )

RGSep action inference Viktor Vafeiadis Microsoft Research Cambridge/ University of Cambridge

CS 360 Programming Languages Day 6 Today Type systems Rules for how types are

www.pdl.cmu.edu/posix/ December 14, 2005 APIs for HPC IO POSIX IO APIs (open, close, read,

What consistency guarantees should concurrent data structure libraries provide? Hans-J. Boehm

Today Arrays One-dimensional Machine-Level Programming IV: Data Multi-dimensional

343H: Honors AI Lecture 15: Bayes Nets Independence 3/18/2014 Kristen Grauman UT Austin Slides

CS 220: Discrete Structures and their Applications Loop Invariants Chapter 3 in zybooks Program

GASPI Tutorial Christian Simmendinger Mirko Rahn Daniel Grnewald Goals Get an overview

Noisy matrix completion: Understanding statistical guarantees for - PowerPoint PPT Presentation

Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization Cong Ma ORFE, Princeton University Yuejie Chi Jianqing Fan Yuxin Chen Yuling Yan CMU ECE Princeton ORFE Princeton EE Princeton

Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Lecture 15: Exact Tensor Completion Joint Work with David Steurer Lecture Outline Part I:

Singularity Degree of PSD Matrix Completion Shin-ichi Tanigawa CWI and Kyoto July 29, 2016 1 /

ELD Completion Module Advice for students on completion of Modules A, B &amp; C Why?

The Parameterized Complexity of Matrix Completion Robert Ganian Joint work with: Eduard Eiben

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Implement Distributed Alternating Least Squares Algorithm for Matrix Completion Varun Gandhi

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

Squared distance matrix of a tree R.B.Bapat Indian Statistical Institute New Delhi, India

Discriminative Training February 19, 2013 Tuesday, February 19, 13 Noisy Channels Again p ( e )

RGSep action inference Viktor Vafeiadis Microsoft Research Cambridge/ University of Cambridge

CS 360 Programming Languages Day 6 Today Type systems Rules for how types are

www.pdl.cmu.edu/posix/ December 14, 2005 APIs for HPC IO POSIX IO APIs (open, close, read,

What consistency guarantees should concurrent data structure libraries provide? Hans-J. Boehm

Today Arrays One-dimensional Machine-Level Programming IV: Data Multi-dimensional

343H: Honors AI Lecture 15: Bayes Nets Independence 3/18/2014 Kristen Grauman UT Austin Slides

CS 220: Discrete Structures and their Applications Loop Invariants Chapter 3 in zybooks Program

GASPI Tutorial Christian Simmendinger Mirko Rahn Daniel Grnewald Goals Get an overview

ELD Completion Module Advice for students on completion of Modules A, B & C Why?