uncertainty quantification for nonconvex tensor completion
play

Uncertainty quantification for nonconvex tensor completion Yuxin - PowerPoint PPT Presentation

Uncertainty quantification for nonconvex tensor completion Yuxin Chen Electrical Engineering, Princeton University Changxiao Cai H. Vincent Poor Princeton EE Princeton EE Ubiquity of high-dimensional tensor data computational genomics


  1. Uncertainty quantification for nonconvex tensor completion Yuxin Chen Electrical Engineering, Princeton University

  2. Changxiao Cai H. Vincent Poor Princeton EE Princeton EE

  3. Ubiquity of high-dimensional tensor data computational genomics dynamic MRI — fig. credit: Schreiber et al. 19 — fig. credit: Liu et al. 17 3/ 21

  4. Challenges in tensor reconstruction a tensor of interest 4/ 21

  5. Challenges in tensor reconstruction a tensor of interest mising data 4/ 21

  6. Challenges in tensor reconstruction a tensor of interest mising data noise 4/ 21

  7. Key to enabling reliable reconstruction from incomplete & noisy data: — exploiting low (CP) rank structure 5/ 21

  8. Noisy tensor completion 6/ 21

  9. Mathematical model T obs T ⋆ • unknown rank- r tensor T ⋆ ∈ R d × d × d r T ⋆ = � u ⋆ i ⊗ u ⋆ i ⊗ u ⋆ i i =1 7/ 21

  10. Mathematical model T obs T ⋆ • unknown rank- r tensor T ⋆ ∈ R d × d × d r T ⋆ = � u ⋆ i ⊗ u ⋆ i ⊗ u ⋆ i i =1 • partial observations over a sampling set Ω T obs i,j,k = T ⋆ i,j,k + noise , ( i, j, k ) ∈ Ω 7/ 21

  11. Mathematical model T obs T ⋆ • unknown rank- r tensor T ⋆ ∈ R d × d × d r T ⋆ = � u ⋆ i ⊗ u ⋆ i ⊗ u ⋆ i i =1 • partial observations over a sampling set Ω T obs i,j,k = T ⋆ i,j,k + noise , ( i, j, k ) ∈ Ω • goal: estimate { u ⋆ i } r i =1 and T ⋆ 7/ 21

  12. Prior art sum-of-squares hierarchy convex relaxation spectral methods nonconvex optimization 8/ 21

  13. Prior art • Gandy, Recht, Yamada ’11 • Liu, Musialski, Wonka, Ye ’12 • Kressner, Steinlechner, Vandereycken ’13 • Xu, Hao, Yin, Su ’13 • Romera-Paredes, Pontil ’13 • Jain, Oh ’14 • Huang, Mu, Goldfarb, Wright ’15 • Barak, Moitra ’16 • Zhang, Aeron ’16 • Yuan, Zhang ’16 • Montanari, Sun ’16 • Kasai, Mishra ’16 • Potechin, Steurer ’17 • Dong, Yuan, Zhang ’17 • Xia, Yuan ’19 • Zhang ’19 • Cai, Li, Poor, Chen ’19 • Cai, Li, Chi, Poor, Chen ’19 • Liu, Moitra ’20 • . . . 9/ 21

  14. A nonconvex approach: Cai et al. (NeurIPS 19) � 2 �� � r � � s =1 u ⊗ 3 i,j,k − T obs U =[ u 1 , ··· , u r ] ∈ R d × r f ( U ) := minimize s i,j,k ( i,j,k ) ∈ Ω � �� � squared loss 1. estimating subspace spanned by low-rank tensor factors — unfolding + spectral methods — iteratively each tensor factor via • proper initializaiton: U 0 2. successive retrieval of tensor factors • gradient descent: for t = 0 , 1 , · · · from subspace estimates — iteratively each tensor factor via — random projection + spectral methods U t +1 = U t − η t ∇ f ( U t ) 3. gradient descent (nonconvex) — random projection + sp — constant learning rates 10/ 21

  15. A nonconvex approach: Cai et al. (NeurIPS 19) 10 -1 10 -2 10 -3 0 5 10 15 20 25 30 Under mild conditions, this nonconvex algorithm achieves • linear convergence • minimax-optimal statistical accuracy (up to log factor) 11/ 21

  16. One step further: reasoning about uncertainty? tensor c sor completion How to to assess unce 12/ 21

  17. One step further: reasoning about uncertainty? tensor c sor completion How to to assess unce How to assess uncertainty, or “confidence”, of obtained estimates due to imperfect data acquisition? • noise • incomplete measurements • · · · 12/ 21

  18. Challenges � 2 �� � r � � s =1 u ⊗ 3 i,j,k − T obs U =[ u 1 , ··· , u r ] ∈ R d × r f ( U ) := minimize s i,j,k ( i,j,k ) ∈ Ω � �� � squared loss • how to pin down distributions of nonconvex solutions? 13/ 21

  19. Challenges � 2 �� � r � � s =1 u ⊗ 3 i,j,k − T obs U =[ u 1 , ··· , u r ] ∈ R d × r f ( U ) := minimize s i,j,k ( i,j,k ) ∈ Ω � �� � squared loss • how to pin down distributions of nonconvex solutions? • how to adapt to unknown noise distributions and heteroscedasticity (i.e. location-varying noise variance)? 13/ 21

  20. Challenges � 2 �� � r � � s =1 u ⊗ 3 i,j,k − T obs U =[ u 1 , ··· , u r ] ∈ R d × r f ( U ) := minimize s i,j,k ( i,j,k ) ∈ Ω � �� � squared loss • how to pin down distributions of nonconvex solutions? • how to adapt to unknown noise distributions and heteroscedasticity (i.e. location-varying noise variance)? • existing estimation guarantees are highly insufficient − → overly wide confidence intervals 13/ 21

  21. Assumptions r � T ⋆ = u ⋆ i ⊗ u ⋆ i ⊗ u ⋆ i ∈ R d × d × d i =1 • random sampling : each entry is observed independently with prob. p � polylog( d ) d 3 / 2 • random noise : independent zero-mean sub-Gaussian with variance of roughly the same order (but not identical) • ground truth : low-rank ( r = O (1) ), incoherent (tensor factors are de-localized and nearly orthogonal to each other), and well-conditioned 14/ 21

  22. Main results: distributional theory 3 U 2 • random sampling 1 • independent sub-Gaussian noise 0 -1 • ground truth: low-rank, incoherent, -2 well-conditioned -3 -3 -2 -1 0 1 2 3 Theorem 1 With high prob., there exists permutation matrix Π ∈ R r × r s.t. U Π − U ⋆ ∼ N ( 0 , Cram´ er-Rao ) + negligible term — asymptotically optimal 15/ 21

  23. Main results: distributional theory 3 T 2 • random sampling 1 • independent sub-Gaussian noise 0 -1 • ground truth: low-rank, incoherent, -2 well-conditioned -3 -3 -2 -1 0 1 2 3 Theorem 2 Consider any ( i, j, k ) s.t. the corresponding “SNR” is not exceedingly small. Then with high prob., T i,j,k − T ⋆ i,j,k ∼ N (0 , Cram´ er-Rao ) + negligible term — asymptotically optimal 15/ 21

  24. • Gaussianality and optimality: estimation error of nonconvex approach is zero-mean Gaussian, who (co)-variance is “minimal” 16/ 21

  25. 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 -3 -2 -1 0 1 2 3 -2 -1 0 1 2 3 tensor factor entry tensor entry • Gaussianality and optimality: estimation error of nonconvex approach is zero-mean Gaussian, who (co)-variance is “minimal” • Confidence intervals: error (co)-variance can be accurately estimated, leading to valid CI construction 16/ 21

  26. 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 -3 -2 -1 0 1 2 3 -2 -1 0 1 2 3 tensor factor entry tensor entry • Gaussianality and optimality: estimation error of nonconvex approach is zero-mean Gaussian, who (co)-variance is “minimal” • Confidence intervals: error (co)-variance can be accurately estimated, leading to valid CI construction • Adaptivity: our procedure is data-driven, fully adaptive to unknown noise levels and heteroscedasticity 16/ 21

  27. Empirical coverage rates (CR) tensor factor tensor entries ( r, σ ) Mean ( CR ) Std ( CR ) ( r, σ ) Mean ( CR ) Std ( CR ) (2 , 10 − 2 ) (2 , 10 − 2 ) 0 . 9481 0 . 0201 0 . 9494 0 . 0218 (2 , 10 − 1 ) 0 . 9477 0 . 0228 (2 , 10 − 1 ) 0 . 9513 0 . 0218 (2 , 1) 0 . 9478 0 . 0215 (2 , 1) 0 . 9475 0 . 0222 (4 , 10 − 2 ) (4 , 10 − 2 ) 0 . 9450 0 . 0218 0 . 9434 0 . 0225 (4 , 10 − 1 ) 0 . 9472 0 . 0231 (4 , 10 − 1 ) 0 . 9494 0 . 0220 (4 , 1) 0 . 9462 0 . 0234 (4 , 1) 0 . 9494 0 . 0219 d = 100 , p = 0 . 2 , heteroscedastic 17/ 21

  28. Back to estimation: ℓ 2 optimality Distributional theory in turn allows us to track estimation accuracy 18/ 21

  29. Back to estimation: ℓ 2 optimality Distributional theory in turn allows us to track estimation accuracy Theorem 3 Suppose noise is i.i.d. Gaussian. ∃ some permutation π ( · ) s.t. (2 + o (1)) σ 2 d � u π ( l ) − u ⋆ l � 2 2 = , 1 ≤ l ≤ r p � u ⋆ l � 4 2 � �� � Cram er-Rao lower bound (6 + o (1)) σ 2 rd � T − T ⋆ � 2 F = p � �� � Cram´ er-Rao lower bound 18/ 21

  30. Back to estimation: ℓ 2 optimality Distributional theory in turn allows us to track estimation accuracy Theorem 3 Suppose noise is i.i.d. Gaussian. ∃ some permutation π ( · ) s.t. (2 + o (1)) σ 2 d � u π ( l ) − u ⋆ l � 2 2 = , 1 ≤ l ≤ r p � u ⋆ l � 4 2 � �� � Cram er-Rao lower bound (6 + o (1)) σ 2 rd � T − T ⋆ � 2 F = p � �� � Cram´ er-Rao lower bound • precise characterization of estimation accuracy • achieves full statistical efficiency (including pre-constant) 18/ 21

  31. Numerical ℓ 2 errors vs. Cram´ er–Rao bounds 10 -5 10 1 10 -6 10 0 10 -7 10 -1 10 -8 10 -2 10 -3 10 -2 10 -1 10 0 10 -3 10 -2 10 -1 10 0 tensor factor estimation tensor estimation r = 4 , p = 0 . 2 , d = 100 19/ 21

  32. Concluding remarks sor estimation ar-optimal s lity guarantees • ion nonconvex op • fast, adaptive to unknown noise levels ex optimization a nd uncertainty qu lly optimal u al uncertainty quantification 20/ 21

  33. Concluding remarks sor estimation ar-optimal s lity guarantees • ion nonconvex op • fast, adaptive to unknown noise levels ex optimization a nd uncertainty qu lly optimal u al uncertainty quantification future directions • improve dependency on rank & cond. number • more general sampling patterns • other tensor-type problems 20/ 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend