 
              Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization Yuxin Chen EE, Princeton University
Cong Ma Yuejie Chi Jianqing Fan Yuling Yan Princeton ORFE CMU ECE Princeton ORFE Princeton ORFE
Convex relaxation for low-rank structure minimize � Z � ∗ Z subj. to noiseless data constraints low-rank matrix semidefinite relaxation figure credit: Piet Mondrian 3/ 40
Convex relaxation for low-rank structure minimize � Z � ∗ Z subj. to noiseless data constraints � matrix sensing (Recht, Fazel, Parrilo ’07) � phase retrieval (Cand` es, Strohmer, Voroninski ’11, Cand` es, Li ’12) � matrix completion (Cand` es, Recht ’08, Cand` es, Tao ’08, Gross ’09) � robust PCA (Chandrasekaran et al. ’09, Cand` es et al. ’09) � Hankel matrix completion (Fazel et al. ’13, Chen, Chi ’13, Cai et al. ’15) � blind deconvolution (Ahmed, Recht, Romberg ’12, Ling, Strohmer ’15) � joint alignment / matching (Chen, Huang, Guibas ’14) . . . 3/ 40
Stability of convex relaxation against noise minimize � Z � ∗ Z subj. to noisy data constraints low-rank matrix semidefinite relaxation figure credit: Piet Mondrian 4/ 40
Stability of convex relaxation against noise minimize f ( Z ; data ) + λ � Z � ∗ � �� � Z empirical loss low-rank matrix semidefinite relaxation figure credit: Piet Mondrian 4/ 40
Stability of convex relaxation against noise minimize f ( Z ; data ) + λ � Z � ∗ � �� � Z empirical loss � matrix sensing (RIP measurements) (Cand` es, Plan ’10) � phase retrieval (Gaussian measurements) (Cand` es et al. ’11) ? matrix completion (Cand` es, Plan ’09, Negahban, Wainwright ’10, Koltchinskii et al. ’10) ? robust PCA (Zhou, Li, Wright, Cand` es, Ma ’10) ? Hankel matrix completion (Chen, Chi ’13) ? blind deconvolution (Ahmed, Recht, Romberg ’12, Ling, Strohmer ’15) ? joint alignment / matching . . . 4/ 40
Stability of convex relaxation against noise minimize f ( Z ; data ) + λ � Z � ∗ � �� � Z empirical loss � matrix sensing (RIP measurements) (Cand` es, Plan ’10) � phase retrieval (Gaussian measurements) (Cand` es et al. ’11) ? this talk: matrix completion (Cand` es, Plan ’09, Negahban, Wainwright ’10, Koltchinskii et al. ’10) ? robust PCA (Zhou, Li, Wright, Cand` es, Ma ’10) ? Hankel matrix completion (Chen, Chi ’13) ? blind deconvolution (Ahmed, Recht, Romberg ’12, Ling, Strohmer ’15) ? joint alignment / matching . . . 4/ 40
Low-rank matrix completion   ? ? ? ? � �   ? ? ? ? ? ? � � ? ?       � ? ? � ? ? ? ? ? ?     ? ? � ? ? �     ? ? ? ? � ? ? ? ? ?     ? � ? ? � ?   ? ? ? ? ? � � ? ? figure credit: E. J. Cand` es Given partial samples of a low-rank matrix M ⋆ , fill in missing entries 5/ 40
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? recommendation systems l ms localization s channel estimation on shape matching
Noisy low-rank matrix completion M i,j = M ⋆ observations: i,j + noise , ( i, j ) ∈ Ω estimate M ⋆ goal:   � ? ? ? � ?   ? ? ? ?  � �      � ? ? � ? ?     ? ? � ? ? �     � ? ? ? ? ?     ? � ? ? � ?   ? ? � � ? ? unknown rank- r matrix M ⋆ ∈ R n × n sampling set Ω 7/ 40
Noisy low-rank matrix completion M i,j = M ⋆ observations: i,j + noise , ( i, j ) ∈ Ω estimate M ⋆ goal: convex relaxation: � � Z i,j − M i,j � 2 minimize + λ � Z � ∗ Z ∈ R n × n ( i,j ) ∈ Ω � �� � squared loss 7/ 40
Prior statistical guarantees for convex relaxation • random sampling: each ( i, j ) ∈ Ω with prob. p • random noise: i.i.d. sub-Gaussian noise with variance σ 2 • true matrix M ⋆ ∈ R n × n : rank r = O (1) , incoherent, . . . 8/ 40
. M ≠ M ı . . „ . ror F Cand` . ion ‡ : noise standard dev. „ es, Plan ’09 . . m i n i m a C x a n l i d m e ` e g i s t a , h P b l a a n n , ’ W 0 9 a i n w r i g h t σn 1 . 5
. M ≠ M ı . . „ . ror F Cand` minimax limit . ion ‡ : noise standard dev. „ es, Plan ’09 . . m i n i m a C x a n l i d m e ` e g i s t a , h P b l a a n n , ’ W 0 9 a i n w r i g h Î M ı Î ∞ t minimax limit . „ . . . σ σn 1 . 5 � n/p
� minimax limit σ n/p σn 1 . 5 Cand` es, Plan ’09 � max { σ, � M ⋆ � ∞ } Negahban, Wainwright ’10 n/p . . t 9 . . „ i t m 0 h ’ i g l n „ tion er minimax limit i r x a w m a l . P m n es, Plan ’09 i i , a Negahban, Wainwright ’10 n s W e ` i Koltchinskii, Tsybakov, Lounici ’10 m Cand` d F n , M ≠ M ı . . n a a C b h a g e . . . „ . „ . . ror minimax limit Î M ı Î ∞ ion ‡ : noise standard dev.
� minimax limit σ n/p σn 1 . 5 Cand` es, Plan ’09 � max { σ, � M ⋆ � ∞ } Negahban, Wainwright ’10 n/p � max { σ, � M ⋆ � ∞ } Koltchinskii, Tsybakov, Lounici ’10 n/p . . t 9 . . „ i t m 0 h ’ i g l n „ tion er minimax limit i r x a w m a a l m 9 . P i 0 m n n es, Plan ’09 ’ i n m 0 i a m l 1 i , a Negahban, Wainwright ’10 P ’ n s t W , h s 0 e ` e ` g i i 1 Koltchinskii, Tsybakov, Lounici ’10 d r ’ m Cand` n w d i F a n c C i n a i n , u M ≠ M ı . . W n o a L , a n , C a v o b b h k a a g b h y e N s T a , g i k i s e n h i . . c t l o . „ . „ K . . ror minimax limit Î M ı Î ∞ ion ‡ : noise standard dev.
1.2 recovery error using SDP 1.68*(oracle error) 1.1 1.68*[(2nr − r 2 )/(pn 2 )] 1/2 1 0.9 convex relaxation 0.8 rms error 0.7 on 1.68 × oracle bound 0.6 0.5 0.4 0.3 0.2 100 200 300 400 500 600 700 800 900 1000 n Existing theory for convex relaxation does not match practice . . .
k − k ≈ with adversarial noise. Consequently, our analysis looses a p n factor vis a vis an optimal bound that is achievable . via the help of an oracle. (III.9) The diligent reader may argue that the least-squares Existing theory for convex relaxation does not match practice . . .
What are the roadblocks? Strategy: � M cvx is optimizer if there exists W s.t. � �� � dual certificate ( � M cvx , W ) obeys KKT optimality condition 11/ 40
What are the roadblocks? Strategy: � M cvx is optimizer if there exists W s.t. � �� � dual certificate ( � M cvx , W ) obeys KKT optimality condition David Gross • noiseless case : � M cvx ← M ⋆ ; W ← golfing scheme � �� � exact recovery 11/ 40
What are the roadblocks? Strategy: � M cvx is optimizer if there exists W s.t. � �� � dual certificate ( � M cvx , W ) obeys KKT optimality condition David Gross • noiseless case : � M cvx ← M ⋆ ; W ← golfing scheme � �� � exact recovery • noisy case : � M cvx is very complicated, hard to construct W . . . 11/ 40
dual certification (golfing scheme)
dual certification (golfing scheme) nonconvex optimization
A detour: nonconvex optimization Burer–Monteiro: represent Z by XY ⊤ with X , Y ∈ R n × r � �� � low-rank factors XY € with with X , ¸ 13/ 40
A detour: nonconvex optimization Burer–Monteiro: represent Z by XY ⊤ with X , Y ∈ R n × r � �� � low-rank factors XY € with with X , ¸ �� XY ⊤ � � 2 � X , Y ∈ R n × r f ( X , Y ) = minimize i,j − M i,j + reg ( X , Y ) ( i,j ) ∈ Ω � �� � squared loss 13/ 40
A detour: nonconvex optimization • Burer, Monteiro ’03 • Rennie, Srebro ’05 • Keshavan, Montanari, Oh ’09 ’10 • Jain, Netrapalli, Sanghavi ’12 • Hardt ’13 • Sun, Luo ’14 • Chen, Wainwright ’15 • Tu, Boczar, Simchowitz, Soltanolkotabi, Recht ’15 • Zhao, Wang, Liu ’15 • Zheng, Lafferty ’16 • Yi, Park, Chen, Caramanis ’16 • Ge, Lee, Ma ’16 • Ge, Jin, Zheng ’17 • Ma, Wang, Chi, Chen ’17 • Chen, Li ’18 • Chen, Liu, Li ’19 • ... 14/ 40
A detour: nonconvex optimization �� XY ⊤ � � 2 � X , Y ∈ R n × r f ( X , Y ) = minimize i,j − M i,j + reg ( X , Y ) ( i,j ) ∈ Ω • suitable initialization: ( X 0 , Y 0 ) • gradient descent: for t = 0 , 1 , . . . X t +1 = X t − η t ∇ X f ( X t , Y t ) Y t +1 = Y t − η t ∇ Y f ( X t , Y t ) 15/ 40
A detour: nonconvex optimization • random sampling: each ( i, j ) ∈ Ω with prob. p • random noise: i.i.d. sub-Gaussian noise with variance σ 2 • true matrix M ⋆ ∈ R n × n : r = O (1) , incoherent, . . . 16/ 40
� minimax limit σ n/p � nonconvex algorithms σ n/p (optimal!) . . . „ r o t i r m r i l x a s F m m h M ≠ M ı . . i n t i i m r o g l a x e v n o c n o . . n . „ . „ . . ror minimax limit ion ‡ : noise standard dev.
Recommend
More recommend