Uncertainty quantification for nonconvex tensor completion Yuxin - - PowerPoint PPT Presentation

uncertainty quantification for nonconvex tensor completion
SMART_READER_LITE
LIVE PREVIEW

Uncertainty quantification for nonconvex tensor completion Yuxin - - PowerPoint PPT Presentation

Uncertainty quantification for nonconvex tensor completion Yuxin Chen Electrical Engineering, Princeton University Changxiao Cai H. Vincent Poor Princeton EE Princeton EE Ubiquity of high-dimensional tensor data computational genomics


slide-1
SLIDE 1

Uncertainty quantification for nonconvex tensor completion

Yuxin Chen Electrical Engineering, Princeton University

slide-2
SLIDE 2

Changxiao Cai Princeton EE

  • H. Vincent Poor

Princeton EE

slide-3
SLIDE 3

Ubiquity of high-dimensional tensor data

computational genomics

— fig. credit: Schreiber et al. 19

dynamic MRI

— fig. credit: Liu et al. 17

3/ 21

slide-4
SLIDE 4

Challenges in tensor reconstruction

a tensor of interest

4/ 21

slide-5
SLIDE 5

Challenges in tensor reconstruction

a tensor of interest mising data

4/ 21

slide-6
SLIDE 6

Challenges in tensor reconstruction

a tensor of interest mising data noise

4/ 21

slide-7
SLIDE 7

Key to enabling reliable reconstruction from incomplete & noisy data: — exploiting low (CP) rank structure

5/ 21

slide-8
SLIDE 8

Noisy tensor completion

6/ 21

slide-9
SLIDE 9

Mathematical model

T ⋆ T obs

  • unknown rank-r tensor T ⋆ ∈ Rd×d×d

T ⋆ =

r

  • i=1

u⋆

i ⊗ u⋆ i ⊗ u⋆ i

7/ 21

slide-10
SLIDE 10

Mathematical model

T ⋆ T obs

  • unknown rank-r tensor T ⋆ ∈ Rd×d×d

T ⋆ =

r

  • i=1

u⋆

i ⊗ u⋆ i ⊗ u⋆ i

  • partial observations over a sampling set Ω

T obs

i,j,k = T ⋆ i,j,k + noise,

(i, j, k) ∈ Ω

7/ 21

slide-11
SLIDE 11

Mathematical model

T ⋆ T obs

  • unknown rank-r tensor T ⋆ ∈ Rd×d×d

T ⋆ =

r

  • i=1

u⋆

i ⊗ u⋆ i ⊗ u⋆ i

  • partial observations over a sampling set Ω

T obs

i,j,k = T ⋆ i,j,k + noise,

(i, j, k) ∈ Ω

  • goal: estimate {u⋆

i }r i=1 and T ⋆

7/ 21

slide-12
SLIDE 12

Prior art

convex relaxation sum-of-squares hierarchy spectral methods nonconvex optimization

8/ 21

slide-13
SLIDE 13

Prior art

  • Gandy, Recht, Yamada ’11
  • Liu, Musialski, Wonka, Ye ’12
  • Kressner, Steinlechner, Vandereycken ’13
  • Xu, Hao, Yin, Su ’13
  • Romera-Paredes, Pontil ’13
  • Jain, Oh ’14
  • Huang, Mu, Goldfarb, Wright ’15
  • Barak, Moitra ’16
  • Zhang, Aeron ’16
  • Yuan, Zhang ’16
  • Montanari, Sun ’16
  • Kasai, Mishra ’16
  • Potechin, Steurer ’17
  • Dong, Yuan, Zhang ’17
  • Xia, Yuan ’19
  • Zhang ’19
  • Cai, Li, Poor, Chen ’19
  • Cai, Li, Chi, Poor, Chen ’19
  • Liu, Moitra ’20
  • . . .

9/ 21

slide-14
SLIDE 14

A nonconvex approach: Cai et al. (NeurIPS 19)

minimize

U=[u1,··· ,ur]∈Rd×r f(U) :=

  • (i,j,k)∈Ω

r

s=1 u⊗3 s

  • i,j,k − T obs

i,j,k

2

  • squared loss
  • proper initializaiton: U 0
  • gradient descent: for t = 0, 1, · · ·

U t+1 = U t − ηt∇f(U t)

  • 3. gradient descent (nonconvex)

— random projection + sp — constant learning rates

  • 1. estimating subspace spanned

by low-rank tensor factors — unfolding + spectral methods — iteratively each tensor factor via

  • 2. successive retrieval of tensor factors

from subspace estimates — iteratively each tensor factor via — random projection + spectral methods

10/ 21

slide-15
SLIDE 15

A nonconvex approach: Cai et al. (NeurIPS 19)

5 10 15 20 25 30 10-3 10-2 10-1

Under mild conditions, this nonconvex algorithm achieves

  • linear convergence
  • minimax-optimal statistical accuracy (up to log factor)

11/ 21

slide-16
SLIDE 16

One step further: reasoning about uncertainty?

tensor c How to sor completion to assess unce

12/ 21

slide-17
SLIDE 17

One step further: reasoning about uncertainty?

tensor c How to sor completion to assess unce

How to assess uncertainty, or “confidence”, of obtained estimates due to imperfect data acquisition?

  • noise
  • incomplete measurements
  • · · ·

12/ 21

slide-18
SLIDE 18

Challenges

minimize

U=[u1,··· ,ur]∈Rd×r f(U) :=

  • (i,j,k)∈Ω

r

s=1 u⊗3 s

  • i,j,k − T obs

i,j,k

2

  • squared loss
  • how to pin down distributions of nonconvex solutions?

13/ 21

slide-19
SLIDE 19

Challenges

minimize

U=[u1,··· ,ur]∈Rd×r f(U) :=

  • (i,j,k)∈Ω

r

s=1 u⊗3 s

  • i,j,k − T obs

i,j,k

2

  • squared loss
  • how to pin down distributions of nonconvex solutions?
  • how to adapt to unknown noise distributions and

heteroscedasticity (i.e. location-varying noise variance)?

13/ 21

slide-20
SLIDE 20

Challenges

minimize

U=[u1,··· ,ur]∈Rd×r f(U) :=

  • (i,j,k)∈Ω

r

s=1 u⊗3 s

  • i,j,k − T obs

i,j,k

2

  • squared loss
  • how to pin down distributions of nonconvex solutions?
  • how to adapt to unknown noise distributions and

heteroscedasticity (i.e. location-varying noise variance)?

  • existing estimation guarantees are highly insufficient

− → overly wide confidence intervals

13/ 21

slide-21
SLIDE 21

Assumptions

T ⋆ =

r

  • i=1

u⋆

i ⊗ u⋆ i ⊗ u⋆ i ∈ Rd×d×d

  • random sampling: each entry is observed independently with
  • prob. p polylog(d)

d3/2

  • random noise: independent zero-mean sub-Gaussian with

variance of roughly the same order (but not identical)

  • ground truth: low-rank (r = O(1)), incoherent (tensor factors

are de-localized and nearly orthogonal to each other), and well-conditioned

14/ 21

slide-22
SLIDE 22

Main results: distributional theory

  • random sampling
  • independent sub-Gaussian noise
  • ground truth: low-rank, incoherent,

well-conditioned

  • 3
  • 2
  • 1

1 2 3

  • 3
  • 2
  • 1

1 2 3

U

Theorem 1 With high prob., there exists permutation matrix Π ∈ Rr×r s.t. UΠ − U ⋆ ∼ N(0, Cram´ er-Rao) + negligible term — asymptotically optimal

15/ 21

slide-23
SLIDE 23

Main results: distributional theory

  • random sampling
  • independent sub-Gaussian noise
  • ground truth: low-rank, incoherent,

well-conditioned

  • 3
  • 2
  • 1

1 2 3

  • 3
  • 2
  • 1

1 2 3

T

Theorem 2 Consider any (i, j, k) s.t. the corresponding “SNR” is not exceedingly

  • small. Then with high prob.,

Ti,j,k − T ⋆

i,j,k ∼ N(0, Cram´

er-Rao) + negligible term — asymptotically optimal

15/ 21

slide-24
SLIDE 24
  • Gaussianality and optimality:

estimation error of nonconvex approach is zero-mean Gaussian, who (co)-variance is “minimal”

16/ 21

slide-25
SLIDE 25
  • 3
  • 2
  • 1

1 2 3 0.05 0.1 0.15 0.2 0.25

tensor factor entry

  • 2
  • 1

1 2 3 0.05 0.1 0.15 0.2 0.25

tensor entry

  • Gaussianality and optimality:

estimation error of nonconvex approach is zero-mean Gaussian, who (co)-variance is “minimal”

  • Confidence intervals: error (co)-variance can be accurately

estimated, leading to valid CI construction

16/ 21

slide-26
SLIDE 26
  • 3
  • 2
  • 1

1 2 3 0.05 0.1 0.15 0.2 0.25

tensor factor entry

  • 2
  • 1

1 2 3 0.05 0.1 0.15 0.2 0.25

tensor entry

  • Gaussianality and optimality:

estimation error of nonconvex approach is zero-mean Gaussian, who (co)-variance is “minimal”

  • Confidence intervals: error (co)-variance can be accurately

estimated, leading to valid CI construction

  • Adaptivity: our procedure is data-driven, fully adaptive to

unknown noise levels and heteroscedasticity

16/ 21

slide-27
SLIDE 27

Empirical coverage rates (CR)

tensor factor

(r, σ) Mean(CR) Std(CR) (2, 10−2) 0.9481 0.0201 (2, 10−1) 0.9477 0.0228 (2, 1) 0.9478 0.0215 (4, 10−2) 0.9450 0.0218 (4, 10−1) 0.9472 0.0231 (4, 1) 0.9462 0.0234

tensor entries

(r, σ) Mean(CR) Std(CR) (2, 10−2) 0.9494 0.0218 (2, 10−1) 0.9513 0.0218 (2, 1) 0.9475 0.0222 (4, 10−2) 0.9434 0.0225 (4, 10−1) 0.9494 0.0220 (4, 1) 0.9494 0.0219

d = 100, p = 0.2, heteroscedastic

17/ 21

slide-28
SLIDE 28

Back to estimation: ℓ2 optimality

Distributional theory in turn allows us to track estimation accuracy

18/ 21

slide-29
SLIDE 29

Back to estimation: ℓ2 optimality

Distributional theory in turn allows us to track estimation accuracy Theorem 3 Suppose noise is i.i.d. Gaussian. ∃ some permutation π(·) s.t. uπ(l) − u⋆

l 2 2 =

(2 + o(1))σ2d pu⋆

l 4 2

  • Cram er-Rao lower bound

, 1 ≤ l ≤ r T − T ⋆2

F =

(6 + o(1))σ2rd p

  • Cram´

er-Rao lower bound

18/ 21

slide-30
SLIDE 30

Back to estimation: ℓ2 optimality

Distributional theory in turn allows us to track estimation accuracy Theorem 3 Suppose noise is i.i.d. Gaussian. ∃ some permutation π(·) s.t. uπ(l) − u⋆

l 2 2 =

(2 + o(1))σ2d pu⋆

l 4 2

  • Cram er-Rao lower bound

, 1 ≤ l ≤ r T − T ⋆2

F =

(6 + o(1))σ2rd p

  • Cram´

er-Rao lower bound

  • precise characterization of estimation accuracy
  • achieves full statistical efficiency (including pre-constant)

18/ 21

slide-31
SLIDE 31

Numerical ℓ2 errors vs. Cram´ er–Rao bounds

10-3 10-2 10-1 100 10-8 10-7 10-6 10-5

tensor factor estimation

10-3 10-2 10-1 100 10-2 10-1 100 101

tensor estimation r = 4, p = 0.2, d = 100

19/ 21

slide-32
SLIDE 32

Concluding remarks

ar-optimal s lity guarantees lly optimal u al uncertainty quantification

ion nonconvex op

ex optimization a nd uncertainty qu sor estimation

  • fast, adaptive to unknown noise levels

20/ 21

slide-33
SLIDE 33

Concluding remarks

ar-optimal s lity guarantees lly optimal u al uncertainty quantification

ion nonconvex op

ex optimization a nd uncertainty qu sor estimation

  • fast, adaptive to unknown noise levels

future directions

  • improve dependency on rank & cond. number
  • more general sampling patterns
  • other tensor-type problems

20/ 21

slide-34
SLIDE 34

Paper:

“Uncertainty quantification for nonconvex tensor completion: Confidence intervals, heteroscedasticity and optimality,” C. Cai, H. V. Poor, Y. Chen, ICML 2020

Other related papers:

“Nonconvex low-rank symmetric tensor completion from noisy data,” C. Cai, G. Li,

  • H. V. Poor, Y. Chen, NeurIPS 2019

“Subspace estimation from unbalanced and incomplete data matrices: ℓ2,∞ statistical guarantees,” C. Cai, G. Li, Y. Chi, H. V. Poor, Y. Chen, accepted to Annals of Statistics, 2019

21/ 21