Asymmetry Helps: Estimation and Inference from Asymmetric and - - PowerPoint PPT Presentation

asymmetry helps estimation and inference from asymmetric
SMART_READER_LITE
LIVE PREVIEW

Asymmetry Helps: Estimation and Inference from Asymmetric and - - PowerPoint PPT Presentation

Asymmetry Helps: Estimation and Inference from Asymmetric and Heteroscedastic Noise Chen Cheng with Yuxin Chen (Princeton), Jianqing Fan (Princeton), Yuting Wei (CMU) Department of Statistics, Stanford University 1/28 C. Cheng, Y. Wei, Y.


slide-1
SLIDE 1

1/28

Asymmetry Helps: Estimation and Inference from Asymmetric and Heteroscedastic Noise

Chen Cheng

with Yuxin Chen (Princeton), Jianqing Fan (Princeton), Yuting Wei (CMU)

Department of Statistics, Stanford University

slide-2
SLIDE 2

2/28

  • C. Cheng, Y. Wei, Y. Chen, “Inference for linear forms of eigenvectors under

minimal eigenvalue separation: Asymmetry and heteroscedasticity”, arXiv:2001.04620, 2020.

  • Y. Chen, C. Cheng, J. Fan, “Asymmetry helps: Eigenvalue and eigenvector

analyses of asymmetrically perturbed low-rank matrices”, arXiv:1811.12804,

  • 2018. (alphabetical order)

— accepted to Annals of Statistics, 2020.

slide-3
SLIDE 3

2/28

1 Introduction 2 Estimation 3 Inference

slide-4
SLIDE 4

3/28

Problem: eigenvalue & eigenvector estimation

M ⋆: symmetric low-rank matrix

  • Rank-r matrix: M ⋆ = r

l=1 λ⋆ l u⋆ l u⋆⊤ l

∈ Rn×n.

slide-5
SLIDE 5

3/28

Problem: eigenvalue & eigenvector estimation

M ⋆: symmetric low-rank matrix H: [Hij]1≤i,j≤n independent noise.

  • Rank-r matrix: M ⋆ = r

l=1 λ⋆ l u⋆ l u⋆⊤ l

∈ Rn×n.

  • Observed data: M = M ⋆ + H.
slide-6
SLIDE 6

3/28

Problem: eigenvalue & eigenvector estimation

M ⋆: symmetric low-rank matrix H: [Hij]1≤i,j≤n independent noise.

  • Rank-r matrix: M ⋆ = r

l=1 λ⋆ l u⋆ l u⋆⊤ l

∈ Rn×n.

  • Observed data: M = M ⋆ + H.
  • Applications:
  • Matrix denoising and completion.
  • Stochastic block model.
  • Ranking from pairwise comparisons.
  • ...
slide-7
SLIDE 7

3/28

Problem: eigenvalue & eigenvector estimation

M ⋆: symmetric low-rank matrix H: [Hij]1≤i,j≤n independent noise.

  • Rank-r matrix: M ⋆ = r

l=1 λ⋆ l u⋆ l u⋆⊤ l

∈ Rn×n.

  • Observed data: M = M ⋆ + H.
  • Goal: retrieve eigenvalue & eigenvector information from M.
slide-8
SLIDE 8

3/28

Problem: eigenvalue & eigenvector estimation

M ⋆: symmetric low-rank matrix H: [Hij]1≤i,j≤n independent noise.

  • Rank-r matrix: M ⋆ = r

l=1 λ⋆ l u⋆ l u⋆⊤ l

∈ Rn×n.

  • Observed data: M = M ⋆ + H.
  • Goal: retrieve eigenvalue & eigenvector information from M.
  • Quantity of interest: eigenvalue error; eigenvector ℓ2 error, ℓ∞ error,

error for any linear function a⊤ul.

slide-9
SLIDE 9

3/28

Problem: eigenvalue & eigenvector estimation

M ⋆: symmetric low-rank matrix H: [Hij]1≤i,j≤n independent noise.

  • Rank-r matrix: M ⋆ = r

l=1 λ⋆ l u⋆ l u⋆⊤ l

∈ Rn×n.

  • Observed data: M = M ⋆ + H.
  • Goal: retrieve eigenvalue & eigenvector information from M.
  • Quantity of interest: eigenvalue error; eigenvector ℓ2 error, ℓ∞ error,

error for any linear function a⊤ul.

  • Strategy:
  • SVD on M or
  • M + M⊤

/2?

slide-10
SLIDE 10

3/28

Problem: eigenvalue & eigenvector estimation

M ⋆: symmetric low-rank matrix H: [Hij]1≤i,j≤n independent noise.

  • Rank-r matrix: M ⋆ = r

l=1 λ⋆ l u⋆ l u⋆⊤ l

∈ Rn×n.

  • Observed data: M = M ⋆ + H.
  • Goal: retrieve eigenvalue & eigenvector information from M.
  • Quantity of interest: eigenvalue error; eigenvector ℓ2 error, ℓ∞ error,

error for any linear function a⊤ul.

  • Strategy:
  • SVD on M or
  • M + M⊤

/2?

  • Eigen-decomposition on M?
slide-11
SLIDE 11

3/28

Problem: eigenvalue & eigenvector estimation

M ⋆: symmetric low-rank matrix H: [Hij]1≤i,j≤n independent noise.

  • Rank-r matrix: M ⋆ = r

l=1 λ⋆ l u⋆ l u⋆⊤ l

∈ Rn×n.

  • Observed data: M = M ⋆ + H.
  • Goal: retrieve eigenvalue & eigenvector information from M.
  • Quantity of interest: eigenvalue error; eigenvector ℓ2 error, ℓ∞ error,

error for any linear function a⊤ul.

  • Strategy:
  • SVD on M or
  • M + M⊤

/2? (Popular strategies)

  • Eigen-decomposition on M? (Much less widely used)
slide-12
SLIDE 12

4/28

A curious experiment: Gaussian noise

  • M = u⋆u⋆⊤ + H, Hi,j i.i.d. N(0, σ2), σ =

1 √n log n.

  • Estimate the leading eigenvalue λ⋆ = 1.
  • SVD on M vs Eigen-decomposition on M.
slide-13
SLIDE 13

4/28

A curious experiment: Gaussian noise

  • M = u⋆u⋆⊤ + H, Hi,j i.i.d. N(0, σ2), σ =

1 √n log n.

  • Estimate the leading eigenvalue λ⋆ = 1.
  • SVD on M vs Eigen-decomposition on M.

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

  • λsvd − λ?
slide-14
SLIDE 14

4/28

A curious experiment: Gaussian noise

  • M = u⋆u⋆⊤ + H, Hi,j i.i.d. N(0, σ2), σ =

1 √n log n.

  • Estimate the leading eigenvalue λ⋆ = 1.
  • SVD on M vs Eigen-decomposition on M.

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD

  • λsvd − λ?
  • .5

n

  • λeigs − λ?
slide-15
SLIDE 15

4/28

A curious experiment: Gaussian noise

  • M = u⋆u⋆⊤ + H, Hi,j i.i.d. N(0, σ2), σ =

1 √n log n.

  • Estimate the leading eigenvalue λ⋆ = 1.
  • SVD on M vs Eigen-decomposition on M.

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD Rescaled SVD Error 2.5 pn

  • λsvd − λ?
  • λsvd − λ?
  • .5

n

  • λeigs − λ?
slide-16
SLIDE 16

4/28

A curious experiment: Gaussian noise

  • M = u⋆u⋆⊤ + H, Hi,j i.i.d. N(0, σ2), σ =

1 √n log n.

  • Estimate the leading eigenvalue λ⋆ = 1.
  • SVD on M vs Eigen-decomposition on M.

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD Rescaled SVD Error 2.5 pn

  • λsvd − λ?
  • λsvd − λ?
  • .5

n

  • λeigs − λ?
  • Wait! But we should know everything under Gaussian noise!
slide-17
SLIDE 17

5/28

A curious experiment: Gaussian noise

  • Indeed, for SVD from i.i.d. Gaussian noise, one can use a corrected

singular value (Benaych-Georges and Nadakuditi, 2012) λsvd,c = λsvd − nσ2 = f(σ, λsvd).

slide-18
SLIDE 18

5/28

A curious experiment: Gaussian noise

  • Indeed, for SVD from i.i.d. Gaussian noise, one can use a corrected

singular value (Benaych-Georges and Nadakuditi, 2012) λsvd,c = λsvd − nσ2 = f(σ, λsvd).

100 200 300 400 500 600 700 800 900 1000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD Corrected SVD

slide-19
SLIDE 19

5/28

A curious experiment: Gaussian noise

  • Indeed, for SVD from i.i.d. Gaussian noise, one can use a corrected

singular value (Benaych-Georges and Nadakuditi, 2012) λsvd,c = λsvd − nσ2 = f(σ, λsvd).

  • For heteroscedastic Gaussian noise, the correction formula is far more

complicated (Bryc et al., 2018)

slide-20
SLIDE 20

6/28

Another experiment: matrix completion

  • What if the noise is heteroscedastic we do not have prior knowledge about?
  • M ⋆ = u⋆u⋆⊤, Mij =
  • 1

pM ⋆ ij,

with prob. p, 0, else, p = 3 log n

n

. H = M − M ⋆.

n j6 ! 6?j

slide-21
SLIDE 21

6/28

Another experiment: matrix completion

  • What if the noise is heteroscedastic we do not have prior knowledge about?
  • M ⋆ = u⋆u⋆⊤, Mij =
  • 1

pM ⋆ ij,

with prob. p, 0, else, p = 3 log n

n

. H = M − M ⋆.

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD Rescaled SVD Error 2.5 pn

  • λsvd − λ?
  • λsvd − λ?
  • .5

n

  • λeigs − λ?
slide-22
SLIDE 22

6/28

Another experiment: matrix completion

  • What if the noise is heteroscedastic we do not have prior knowledge about?
  • M ⋆ = u⋆u⋆⊤, Mij =
  • 1

pM ⋆ ij,

with prob. p, 0, else, p = 3 log n

n

. H = M − M ⋆.

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD Rescaled SVD Error 2.5 pn

  • λsvd − λ?
  • λsvd − λ?
  • .5

n

  • λeigs − λ?
  • Eigen-decomposition is nearly unbiased regardless of the noise distribution!
slide-23
SLIDE 23

7/28

One more experiment: heteroscedastic Gaussian noise

  • M = u⋆

1u⋆⊤ 1

+ 0.95u⋆

2u⋆⊤ 2

+ H, u⋆

1 = 1 √n

  • 1n/2

1n/2

  • ; u⋆

2 = 1 √n

  • 1n/2

−1n/2

  • [Var(Hij)]i,j ≈

1 n log n

  • 11⊤
  • +

1 10011⊤

Dimension n dist(u2; u?

2)

slide-24
SLIDE 24

7/28

One more experiment: heteroscedastic Gaussian noise

  • M = u⋆

1u⋆⊤ 1

+ 0.95u⋆

2u⋆⊤ 2

+ H, u⋆

1 = 1 √n

  • 1n/2

1n/2

  • ; u⋆

2 = 1 √n

  • 1n/2

−1n/2

  • [Var(Hij)]i,j ≈

1 n log n

  • 11⊤
  • +

1 10011⊤

  • Estimate u⋆

2 by eigen-decomposition on the symmetrized data

(M + M ⊤)/2 and the original data M. Dimension n dist(u2; u?

2)

slide-25
SLIDE 25

7/28

One more experiment: heteroscedastic Gaussian noise

  • M = u⋆

1u⋆⊤ 1

+ 0.95u⋆

2u⋆⊤ 2

+ H, u⋆

1 = 1 √n

  • 1n/2

1n/2

  • ; u⋆

2 = 1 √n

  • 1n/2

−1n/2

  • [Var(Hij)]i,j ≈

1 n log n

  • 11⊤
  • +

1 10011⊤

  • Estimate u⋆

2 by eigen-decomposition on the symmetrized data

(M + M ⊤)/2 and the original data M.

eigen-sym

500 1000 1500 2000 2500 3000

Dimension n

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

dist(u2; u?

2)

Spectral-asym Spectral-sym

eigen-asym

slide-26
SLIDE 26

7/28

One more experiment: heteroscedastic Gaussian noise

  • M = u⋆

1u⋆⊤ 1

+ 0.95u⋆

2u⋆⊤ 2

+ H, u⋆

1 = 1 √n

  • 1n/2

1n/2

  • ; u⋆

2 = 1 √n

  • 1n/2

−1n/2

  • [Var(Hij)]i,j ≈

1 n log n

  • 11⊤
  • +

1 10011⊤

  • Estimate u⋆

2 by eigen-decomposition on the symmetrized data

(M + M ⊤)/2 and the original data M.

eigen-sym

500 1000 1500 2000 2500 3000

Dimension n

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

dist(u2; u?

2)

Spectral-asym Spectral-sym

eigen-asym

Symmetrization for heteroscedastic data seems suboptimal!

slide-27
SLIDE 27

Why does eigen-decomposition work so well on asymmetric data?

8/28

slide-28
SLIDE 28

8/28

1 Introduction 2 Estimation 3 Inference

slide-29
SLIDE 29

9/28

Problem setup

M =

r

  • l=1

λ⋆

l u⋆ l u⋆⊤ l

  • M⋆

+ H ∈ Rn×n

slide-30
SLIDE 30

9/28

Problem setup

M =

r

  • l=1

λ⋆

l u⋆ l u⋆⊤ l

  • M⋆

+ H ∈ Rn×n

  • M ⋆: rank-r ground-truth, |λ⋆

1| ≥ · · · ≥ |λ⋆ r| > 0.

  • H: noise matrix
  • independent entries: {Hi,j} are independent
  • zero mean: E[Hi,j] = 0
  • variance: Var(Hi,j) ≤ σ2
  • magnitudes: P{|Hi,j| ≥ B} n−12
slide-31
SLIDE 31

9/28

Problem setup

M =

r

  • l=1

λ⋆

l u⋆ l u⋆⊤ l

  • M⋆

+ H ∈ Rn×n

  • M ⋆: rank-r ground-truth, |λ⋆

1| ≥ · · · ≥ |λ⋆ r| > 0.

  • H: noise matrix
  • independent entries: {Hi,j} are independent
  • zero mean: E[Hi,j] = 0
  • variance: Var(Hi,j) ≤ σ2
  • magnitudes: P{|Hi,j| ≥ B} n−12
  • M ⋆ obeys incoherence condition

u⋆

k∞ ≤

  • µ

n

slide-32
SLIDE 32

10/28

Classical linear algebra for eigenvalue

  • λsvd

l

− λ⋆

l

  • ≤ H

(Weyl)

  • λeigs

l

− λ⋆

j

  • ≤ H

(Bauer-Fike)

slide-33
SLIDE 33

10/28

Classical linear algebra for eigenvalue

  • λsvd

l

− λ⋆

l

  • ≤ H

(Weyl)

  • λeigs

l

− λ⋆

j

  • ≤ H

(Bauer-Fike)

matrix Bernstein inequality

  • λsvd

l

− λ⋆

l

  • σ
  • n log n + B log n
  • λeigs

l

− λ⋆

j

  • σ
  • n log n + B log n
slide-34
SLIDE 34

10/28

Classical linear algebra for eigenvalue

  • λsvd

l

− λ⋆

l

  • ≤ H

(Weyl)

  • λeigs

l

− λ⋆

j

  • ≤ H

(Bauer-Fike)

matrix Bernstein inequality

  • λsvd

l

− λ⋆

l

  • σ
  • n log n + B log n

(reasonably tight if H is large)

  • λeigs

l

− λ⋆

j

  • σ
  • n log n + B log n
slide-35
SLIDE 35

10/28

Classical linear algebra for eigenvalue

  • λsvd

l

− λ⋆

l

  • ≤ H

(Weyl)

  • λeigs

l

− λ⋆

j

  • ≤ H

(Bauer-Fike)

matrix Bernstein inequality

  • λsvd

l

− λ⋆

l

  • σ
  • n log n + B log n

(reasonably tight if H is large)

  • λeigs

l

− λ⋆

j

  • σ
  • n log n + B log n

(can be significantly improved)

slide-36
SLIDE 36

11/28

Rank-1: eigenvalue perturbation

Theorem 1 (Chen, Cheng, Fan ’18)

Assume σ√n log n + B log n ≤ c|λ⋆| for some constant c. With high prob., leading eigenvalue λeigs of M obeys

  • λeigs − λ⋆
  • µ

n

  • σ
  • n log n + B log n
  • n

j6 ! 6?j n j6 ! 6?j n j6 ! 6?j

slide-37
SLIDE 37

11/28

Rank-1: eigenvalue perturbation

Theorem 1 (Chen, Cheng, Fan ’18)

Assume σ√n log n + B log n ≤ c|λ⋆| for some constant c. With high prob., leading eigenvalue λeigs of M obeys

  • λeigs − λ⋆
  • µ

n

  • σ
  • n log n + B log n
  • 200

400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD Rescaled SVD Error 2.5 pn

  • λsvd − λ?
  • λsvd − λ?
  • .5

n

  • λeigs − λ?
  • Eigen-decomposition is
  • n

µ times better than SVD!

— recall

  • λsvd − λ⋆

σ√n log n + B log n

slide-38
SLIDE 38

12/28

Rank-1: entrywise eigenvector perturbation

Theorem 2 (Chen, Cheng, Fan ’18)

With high prob., leading eigenvector u of M obeys min

  • u ± u⋆
  • µ

n

  • σ
  • n log n + B log n
slide-39
SLIDE 39

12/28

Rank-1: entrywise eigenvector perturbation

Theorem 2 (Chen, Cheng, Fan ’18)

With high prob., leading eigenvector u of M obeys min

  • u ± u⋆
  • µ

n

  • σ
  • n log n + B log n
  • if H ≪
  • λ⋆

, then min

  • u ± u⋆
  • 2 ≪
  • u⋆
  • 2

(classical bound)

slide-40
SLIDE 40

12/28

Rank-1: entrywise eigenvector perturbation

Theorem 2 (Chen, Cheng, Fan ’18)

With high prob., leading eigenvector u of M obeys min

  • u ± u⋆
  • µ

n

  • σ
  • n log n + B log n
  • if H ≪
  • λ⋆

, then min

  • u ± u⋆
  • 2 ≪
  • u⋆
  • 2

(classical bound) min

  • u ± u⋆
  • ∞ ≪
  • u⋆

(our bound)

slide-41
SLIDE 41

12/28

Rank-1: entrywise eigenvector perturbation

Theorem 2 (Chen, Cheng, Fan ’18)

With high prob., leading eigenvector u of M obeys min

  • u ± u⋆
  • µ

n

  • σ
  • n log n + B log n
  • if H ≪
  • λ⋆

, then min

  • u ± u⋆
  • 2 ≪
  • u⋆
  • 2

(classical bound) min

  • u ± u⋆
  • ∞ ≪
  • u⋆

(our bound)

  • entrywise eigenvector perturbation is well-controlled
slide-42
SLIDE 42

13/28

Rank-1: perturbation of linear forms of eigenvectors

Theorem 3 (Chen, Cheng, Fan ’18)

Fix any unit vector a. With high prob., leading eigenvector u of M obeys min

  • a⊤(u ± u⋆)
  • max
  • a⊤u⋆

,

  • µ

n σ

  • n log n + B log n
slide-43
SLIDE 43

13/28

Rank-1: perturbation of linear forms of eigenvectors

Theorem 3 (Chen, Cheng, Fan ’18)

Fix any unit vector a. With high prob., leading eigenvector u of M obeys min

  • a⊤(u ± u⋆)
  • max
  • a⊤u⋆

,

  • µ

n σ

  • n log n + B log n
  • if H ≪
  • λ⋆

, then min

  • a⊤(u ± u⋆)
  • ≪ max
  • a⊤u⋆

, u⋆∞

  • perturbation of an arbitrary linear form of leading eigenvector is

well-controlled.

slide-44
SLIDE 44

13/28

Rank-1: perturbation of linear forms of eigenvectors

Theorem 3 (Chen, Cheng, Fan ’18)

Fix any unit vector a. With high prob., leading eigenvector u of M obeys min

  • a⊤(u ± u⋆)
  • max
  • a⊤u⋆

,

  • µ

n σ

  • n log n + B log n
  • if H ≪
  • λ⋆

, then min

  • a⊤(u ± u⋆)
  • ≪ max
  • a⊤u⋆

, u⋆∞

  • perturbation of an arbitrary linear form of leading eigenvector is

well-controlled.

  • very few results are available for symmetric noise.
slide-45
SLIDE 45

14/28

Classical linear algebra for eigenvector

(eigenvalue separation) ∆l := min

k:k=l |λ⋆ l − λ⋆ k|

∆2 & λı

1 ı 1 λı 2 ı 2 λı 3

slide-46
SLIDE 46

14/28

Classical linear algebra for eigenvector

(eigenvalue separation) ∆l := min

k:k=l |λ⋆ l − λ⋆ k|

∆2 & λı

1 ı 1 λı 2 ı 2 λı 3

min

  • usvd

l

± u⋆

l

  • 2 H

∆l (Davis-Kahan) min

  • ueigs

l

± u⋆

l

  • 2 ??
slide-47
SLIDE 47

14/28

Classical linear algebra for eigenvector

(eigenvalue separation) ∆l := min

k:k=l |λ⋆ l − λ⋆ k|

∆2 & λı

1 ı 1 λı 2 ı 2 λı 3

min

  • usvd

l

± u⋆

l

  • 2 H

∆l (Davis-Kahan) min

  • ueigs

l

± u⋆

l

  • 2 ??

matrix concentration inequality min

  • usvd

l

± u⋆

l

  • 2 σ√n

∆l (requires ∆l H, and B is sufficiently small) min

  • ueigs

l

± u⋆

l

  • 2 ??
slide-48
SLIDE 48

15/28

Rank-r: eigenvalue / eigenvector perturbation

(eigenvalue separation) ∆l := min

k:k=l |λ⋆ l − λ⋆ k|

∆2 & λı

1 ı 1 λı 2 ı 2 λı 3

Theorem 4 (Cheng, Wei, Chen ’20)

Suppose M⋆ is well-conditioned, incoherent, and r = O(1). Assume ∆l > 2c0σ

  • log n

for some const c0 > 0 (1)

slide-49
SLIDE 49

15/28

Rank-r: eigenvalue / eigenvector perturbation

(eigenvalue separation) ∆l := min

k:k=l |λ⋆ l − λ⋆ k|

∆2 & λı

1 ı 1 λı 2 ı 2 λı 3

Theorem 4 (Cheng, Wei, Chen ’20)

Suppose M⋆ is well-conditioned, incoherent, and r = O(1). Assume ∆l > 2c0σ

  • log n

for some const c0 > 0 (1) With high prob., lth largest e-value λeigs

l

& e-vector ueigs

l

  • f M obey
  • λeigs

l

− λ⋆

l

  • ≤ c0σ
  • log n

min ueigs

l

± u⋆

l 2 σ√log n

∆⋆

l

+ σ√n log n M⋆

slide-50
SLIDE 50

15/28

Rank-r: eigenvalue / eigenvector perturbation

(eigenvalue separation) ∆l := min

k:k=l |λ⋆ l − λ⋆ k|

∆2 & λı

1 ı 1 λı 2 ı 2 λı 3

Theorem 4 (Cheng, Wei, Chen ’20)

Suppose M⋆ is well-conditioned, incoherent, and r = O(1). Assume ∆l > 2c0σ

  • log n

for some const c0 > 0 (1) With high prob., lth largest e-value λeigs

l

& e-vector ueigs

l

  • f M obey
  • λeigs

l

− λ⋆

l

  • ≤ c0σ
  • log n

min ueigs

l

± u⋆

l 2 σ√log n

∆⋆

l

+ σ√n log n M⋆ Similar bounds for entrywise perturbation and linear forms perturbation.

slide-51
SLIDE 51

16/28

SVD vs. Eigen-decomposition

  • 1. eigenvalue estimation: Eigen-decomposition is O(√n) times more

accurate

  • λsvd

l

− λ⋆

l

  • σ√n

(Weyl)

  • λeigs

l

− λ⋆

l

  • σ
  • log n

(ours)

slide-52
SLIDE 52

16/28

SVD vs. Eigen-decomposition

  • 1. eigenvalue estimation: Eigen-decomposition is O(√n) times more

accurate

  • λsvd

l

− λ⋆

l

  • σ√n

(Weyl)

  • λeigs

l

− λ⋆

l

  • σ
  • log n

(ours)

  • 2. eigenvector estimation: Eigen-decomposition works under O(√n) times

smaller eigenvalue separation min

  • usvd

l

± u⋆

l

  • = o(1)

if ∆l σ√n (Davis-Kahan) min

  • ueigs

l

± u⋆

l

  • = o(1)

if ∆l σ

  • log n

(ours)

slide-53
SLIDE 53

16/28

SVD vs. Eigen-decomposition

  • 1. eigenvalue estimation: Eigen-decomposition is O(√n) times more

accurate

  • λsvd

l

− λ⋆

l

  • σ√n

(Weyl)

  • λeigs

l

− λ⋆

l

  • σ
  • log n

(ours)

  • 2. eigenvector estimation: Eigen-decomposition works under O(√n) times

smaller eigenvalue separation min

  • usvd

l

± u⋆

l

  • = o(1)

if ∆l σ√n (Davis-Kahan) min

  • ueigs

l

± u⋆

l

  • = o(1)

if ∆l σ

  • log n

(ours) (The same bound holds for symmetrized eigen-decomposition on (M + M ⊤)/2 as SVD on M)

slide-54
SLIDE 54

17/28

Summary of estimation from eigen-decomposition on asymmetric noise

  • no need of bias correction
  • faithful eigenvector estimation under much smaller eigenvalue separation
  • distribution-free
  • adaptive to heteroscedastic noise
slide-55
SLIDE 55

17/28

Summary of estimation from eigen-decomposition on asymmetric noise

  • no need of bias correction
  • faithful eigenvector estimation under much smaller eigenvalue separation
  • distribution-free
  • adaptive to heteroscedastic noise

Cool!

slide-56
SLIDE 56

17/28

Summary of estimation from eigen-decomposition on asymmetric noise

  • no need of bias correction
  • faithful eigenvector estimation under much smaller eigenvalue separation
  • distribution-free
  • adaptive to heteroscedastic noise

Cool! Can we do more?

slide-57
SLIDE 57

17/28

1 Introduction 2 Estimation 3 Inference

slide-58
SLIDE 58

18/28

Problem setup

M =

r

  • l=1

λ⋆

l u⋆ l u⋆⊤ l

  • M⋆

+ H ∈ Rn×n

slide-59
SLIDE 59

18/28

Problem setup

M =

r

  • l=1

λ⋆

l u⋆ l u⋆⊤ l

  • M⋆

+ H ∈ Rn×n

  • Goal: Infer eigenvalues λ⋆

l and linear forms a⊤u⋆ l .

slide-60
SLIDE 60

18/28

Problem setup

M =

r

  • l=1

λ⋆

l u⋆ l u⋆⊤ l

  • M⋆

+ H ∈ Rn×n

  • Goal: Infer eigenvalues λ⋆

l and linear forms a⊤u⋆ l .

  • H: noise matrix
  • independent entries: {Hi,j} are independent
  • zero mean: E[Hi,j] = 0
  • variance: σ2

min ≤ Var(Hi,j) ≤ σ2 max ≪ (λ⋆

min)2

n log n with σmax σmin = O(1)

  • magnitudes: |Hi,j| ≤ σmax
  • n/ log n with high prob.
  • M ⋆ obeys incoherence condition

u⋆

k∞ ≤

  • µ

n

slide-61
SLIDE 61

18/28

Problem setup

M =

r

  • l=1

λ⋆

l u⋆ l u⋆⊤ l

  • M⋆

+ H ∈ Rn×n

  • Goal: Infer eigenvalues λ⋆

l and linear forms a⊤u⋆ l .

  • H: noise matrix
  • independent entries: {Hi,j} are independent
  • zero mean: E[Hi,j] = 0
  • variance: σ2

min ≤ Var(Hi,j) ≤ σ2 max ≪ (λ⋆

min)2

n log n with σmax σmin = O(1)

  • magnitudes: |Hi,j| ≤ σmax
  • n/ log n with high prob.
  • M ⋆ obeys incoherence condition

u⋆

k∞ ≤

  • µ

n

  • Can we quantify the uncertainty of the proposed estimators? Do they

achieve statistical optimality?

slide-62
SLIDE 62

19/28

Which estimator shall we use?

A natural start point: λl and a⊤ul (or a⊤wl)

slide-63
SLIDE 63

19/28

Which estimator shall we use?

A natural start point: λl and a⊤ul (or a⊤wl)

  • λl: Good enough unbiased estimator for λ⋆

l !

  • a⊤ul (or a⊤wl): Not so good for a⊤u⋆

l .

slide-64
SLIDE 64

19/28

Which estimator shall we use?

A natural start point: λl and a⊤ul (or a⊤wl)

  • λl: Good enough unbiased estimator for λ⋆

l !

  • a⊤ul (or a⊤wl): Not so good for a⊤u⋆

l .

Issues:

  • bias aggregation: even though ul is nearly unbiased estimate of u⋆

l in

every entry, it does NOT mean a⊤ul is nearly unbiased

slide-65
SLIDE 65

19/28

Which estimator shall we use?

A natural start point: λl and a⊤ul (or a⊤wl)

  • λl: Good enough unbiased estimator for λ⋆

l !

  • a⊤ul (or a⊤wl): Not so good for a⊤u⋆

l .

Issues:

  • bias aggregation: even though ul is nearly unbiased estimate of u⋆

l in

every entry, it does NOT mean a⊤ul is nearly unbiased

  • optimality? it is unclear whether a⊤ul incurrs minimal uncertainty
slide-66
SLIDE 66

20/28

Key observation: rank-1 case

Neumann series imply u1 = λ⋆

1u⋆⊤ 1 u

λ1

+∞

  • s=0

H λ1 s u⋆.

slide-67
SLIDE 67

20/28

Key observation: rank-1 case

Neumann series imply u1 = λ⋆

1u⋆⊤ 1 u

λ1

+∞

  • s=0

H λ1 s u⋆. Hence a⊤u1 ≈

  • u⋆⊤

1 u1

a⊤u⋆

1 + a⊤Hu⋆ 1

λ⋆

1

  • .
slide-68
SLIDE 68

20/28

Key observation: rank-1 case

Neumann series imply u1 = λ⋆

1u⋆⊤ 1 u

λ1

+∞

  • s=0

H λ1 s u⋆. Hence a⊤u1 ≈

  • u⋆⊤

1 u1

a⊤u⋆

1 + a⊤Hu⋆ 1

λ⋆

1

  • .

The plug-in estimator a⊤u1 is an underestimate by approximately u⋆⊤

1 u1.

slide-69
SLIDE 69

20/28

Key observation: rank-1 case

Neumann series imply u1 = λ⋆

1u⋆⊤ 1 u

λ1

+∞

  • s=0

H λ1 s u⋆. Hence a⊤u1 ≈

  • u⋆⊤

1 u1

a⊤u⋆

1 + a⊤Hu⋆ 1

λ⋆

1

  • .

The plug-in estimator a⊤u1 is an underestimate by approximately u⋆⊤

1 u1.

How can we reduce the bias?

slide-70
SLIDE 70

20/28

Key observation: rank-1 case

Neumann series imply u1 = λ⋆

1u⋆⊤ 1 u

λ1

+∞

  • s=0

H λ1 s u⋆. Hence a⊤u1 ≈

  • u⋆⊤

1 u1

a⊤u⋆

1 + a⊤Hu⋆ 1

λ⋆

1

  • .

The plug-in estimator a⊤u1 is an underestimate by approximately u⋆⊤

1 u1.

How can we reduce the bias? ˆ ua,l =

  • 1

u⊤

l wl

(a⊤ul) (a⊤wl).

slide-71
SLIDE 71

20/28

Key observation: rank-1 case

Neumann series imply u1 = λ⋆

1u⋆⊤ 1 u

λ1

+∞

  • s=0

H λ1 s u⋆. Hence a⊤u1 ≈

  • u⋆⊤

1 u1

a⊤u⋆

1 + a⊤Hu⋆ 1

λ⋆

1

  • .

The plug-in estimator a⊤u1 is an underestimate by approximately u⋆⊤

1 u1.

How can we reduce the bias? ˆ ua,l =

  • 1

u⊤

l wl

(a⊤ul) (a⊤wl). The bias term has been canceled out!

slide-72
SLIDE 72

21/28

Rank-r: distributional characterization

  • M ⋆ is well-conditioned, incoherent, and r = O(1)

 

1 a2

  • a⊤u⋆

l

  • = o
  • 1

√log n min

  • ∆⋆

l

|λ⋆

l |, 1

(size of target quantity)

1 a2

  • a⊤u⋆

k

  • = o
  • 1

√log n |λ⋆

l −λ⋆ k|

|λ⋆

l |

  • ,

∀k = l (size of “interferers”)

  • σmax log n = o(∆⋆

l )

(minimal e-value separation)

slide-73
SLIDE 73

21/28

Rank-r: distributional characterization

  • M ⋆ is well-conditioned, incoherent, and r = O(1)

 

1 a2

  • a⊤u⋆

l

  • = o
  • 1

√log n min

  • ∆⋆

l

|λ⋆

l |, 1

(size of target quantity)

1 a2

  • a⊤u⋆

k

  • = o
  • 1

√log n |λ⋆

l −λ⋆ k|

|λ⋆

l |

  • ,

∀k = l (size of “interferers”)

  • σmax log n = o(∆⋆

l )

(minimal e-value separation)

Theorem 5 (Cheng, Wei, Chen ’20)

Under above assumptions, with high prob. one has

  • ua,l ≈ a⊤u⋆

l + 1 2λ⋆

l a⊤(H + H⊤)u⋆

l N(a⊤u⋆ l , v⋆ a,l)

slide-74
SLIDE 74

21/28

Rank-r: distributional characterization

  • M ⋆ is well-conditioned, incoherent, and r = O(1)

 

1 a2

  • a⊤u⋆

l

  • = o
  • 1

√log n min

  • ∆⋆

l

|λ⋆

l |, 1

(size of target quantity)

1 a2

  • a⊤u⋆

k

  • = o
  • 1

√log n |λ⋆

l −λ⋆ k|

|λ⋆

l |

  • ,

∀k = l (size of “interferers”)

  • σmax log n = o(∆⋆

l )

(minimal e-value separation)

Theorem 5 (Cheng, Wei, Chen ’20)

Under above assumptions, with high prob. one has

  • ua,l ≈ a⊤u⋆

l + 1 2λ⋆

l a⊤(H + H⊤)u⋆

l N(a⊤u⋆ l , v⋆ a,l)

Theorem 6 (Cheng, Wei, Chen ’20)

Under above assumptions, with high prob. one has λl ≈ λ⋆

l + u⋆⊤ l

Hu⋆

l N(λ⋆ l , v⋆ λ,l)

slide-75
SLIDE 75

22/28

Rank-r: confidence intervals & optimality

  • v⋆

a,l and v⋆ λ,l can both be faithfully estimated.

slide-76
SLIDE 76

22/28

Rank-r: confidence intervals & optimality

  • v⋆

a,l and v⋆ λ,l can both be faithfully estimated.

  • yields (1 − α)-confidence intervals
  • ua,l ± Φ−1(1 − α/2)
  • ˆ

va,l

  • λl ± Φ−1(1 − α/2)
  • ˆ

vλ,l

slide-77
SLIDE 77

22/28

Rank-r: confidence intervals & optimality

  • v⋆

a,l and v⋆ λ,l can both be faithfully estimated.

  • yields (1 − α)-confidence intervals
  • ua,l ± Φ−1(1 − α/2)
  • ˆ

va,l

  • λl ± Φ−1(1 − α/2)
  • ˆ

vλ,l

  • Can they be further shortened?
slide-78
SLIDE 78

22/28

Rank-r: confidence intervals & optimality

  • v⋆

a,l and v⋆ λ,l can both be faithfully estimated.

  • yields (1 − α)-confidence intervals
  • ua,l ± Φ−1(1 − α/2)
  • ˆ

va,l

  • λl ± Φ−1(1 − α/2)
  • ˆ

vλ,l

  • Can they be further shortened?
  • Hij

i.i.d.

∼ N(0, σ2), a⊤ul = o(1), Cramer-Rao lower bounds follow as

Theorem 7 (Cheng, Wei, Chen ’20)

Any unbiased estimator Ua (resp. Λ) of a⊤u⋆

l (resp. λ⋆ l ) obeys

Var[ Ua] ≥ (1 − o(1))Var

  • 1

2λ⋆

l a(H + H⊤)u⋆

l

  • = (1 − o(1))v⋆

a,l

Var[ Λ] ≥ (1 − o(1))Var

  • u⋆⊤

l

Hu⋆

l

  • = (1 − o(1))v⋆

λ,l

slide-79
SLIDE 79

22/28

Rank-r: confidence intervals & optimality

  • v⋆

a,l and v⋆ λ,l can both be faithfully estimated.

  • yields (1 − α)-confidence intervals
  • ua,l ± Φ−1(1 − α/2)
  • ˆ

va,l

  • λl ± Φ−1(1 − α/2)
  • ˆ

vλ,l

  • Can they be further shortened?

Optimal!

  • Hij

i.i.d.

∼ N(0, σ2), a⊤ul = o(1), Cramer-Rao lower bounds follow as

Theorem 7 (Cheng, Wei, Chen ’20)

Any unbiased estimator Ua (resp. Λ) of a⊤u⋆

l (resp. λ⋆ l ) obeys

Var[ Ua] ≥ (1 − o(1))Var

  • 1

2λ⋆

l a(H + H⊤)u⋆

l

  • = (1 − o(1))v⋆

a,l

Var[ Λ] ≥ (1 − o(1))Var

  • u⋆⊤

l

Hu⋆

l

  • = (1 − o(1))v⋆

λ,l

slide-80
SLIDE 80

23/28

Experiment: estimating a⊤u⋆

2

  • rank-2: λ⋆

1 = 1, λ⋆ 2 = 0.95, a⊤u⋆ 1 = 0, a⊤u⋆ 2 = 0.1

  • heteroscedastic Gaussian noise;
  • Var(Hij)
  • i,j =

   

σ2 1 (σ1 + δσ)2 . . . (σ1 + (n − 1)δσ)2

   1⊤

n

σ1 =

0.1 √n log n , δσ = 0.4 n√n log n

200 400 600 800 1000

Number of trials

0.094 0.096 0.098 0.1 0.102 0.104 0.106

Con-dence intervals

95% confidence intervals

  • 3
  • 2
  • 1

1 2 3

Standard normal quantiles

  • 3
  • 2
  • 1

1 2 3

Empirical quantiles of ^ ua;l ! a>u?

l

p ^ va;l

Q-Q (quantile-quantile) plot

slide-81
SLIDE 81

24/28

Experiment: estimating a⊤u⋆

2

Recall that our theory requires control of the “interferers” {a⊤u⋆

k}k=l

slide-82
SLIDE 82

24/28

Experiment: estimating a⊤u⋆

2

Recall that our theory requires control of the “interferers” {a⊤u⋆

k}k=l

Numerically, it does seem that these “interferers” cannot be too large

slide-83
SLIDE 83

25/28

Experiment: estimating a⊤u⋆

2

  • rank-2: λ⋆

1 = 1, λ⋆ 2 = 0.95, a⊤u⋆ 1 = 0.2, a⊤u⋆ 2 = 0.1

  • heteroscedastic Gaussian noise;
  • Var(Hij)
  • i,j =

   

σ2 1 (σ1 + δσ)2 . . . (σ1 + (n − 1)δσ)2

   1⊤

n

σ1 =

0.1 √n log n , δσ = 0.4 n√n log n

200 400 600 800 1000

Number of trials

0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115 0.12

Con-dence intervals

95% confidence intervals

  • 3
  • 2
  • 1

1 2 3

Standard normal quantiles

  • 3
  • 2
  • 1

1 2 3

Empirical quantiles of ^ ua;l ! a>u?

l

p ^ va;l

Q-Q (quantile-quantile) plot

slide-84
SLIDE 84

26/28

Experiment: estimating λ⋆

2

  • rank-2: λ⋆

1 = 1, λ⋆ 2 = 0.95, a⊤u⋆ 1 = 0, a⊤u⋆ 1 = 0.1

  • heteroscedastic Gaussian noise;
  • Var(Hij)
  • i,j =

   

σ2 1 (σ1 + δσ)2 . . . (σ1 + (n − 1)δσ)2

   1⊤

n

σ1 =

0.1 √n log n , δσ = 0.4 n√n log n

200 400 600 800 1000

Number of trials

0.94 0.942 0.944 0.946 0.948 0.95 0.952 0.954 0.956 0.958

Con-dence intervals

95% confidence intervals

  • 3
  • 2
  • 1

1 2 3

Standard normal quantiles

  • 3
  • 2
  • 1

1 2 3

Empirical quantiles of 6l ! 6?

l

p ^ v6;l

Q-Q (quantile-quantile) plot

slide-85
SLIDE 85

27/28

Experiment: other settings

Settings Target Numerical coverage rates heteroscedastic Gaussian noise linear form a⊤u⋆

2

0.953 eigenvalue λ⋆

2

0.950 heteroscedastic Bernoulli noise linear form a⊤u⋆

2

0.955 eigenvalue λ⋆

2

0.942 missing data + noise linear form a⊤u⋆

2

0.947 eigenvalue λ⋆

2

0.954

Table 1: Numerical coverage rates for our 95% confidence intervals over 1000 independent trials.

  • Our theory is corroborated by experiments!
slide-86
SLIDE 86

28/28

Conclusions

Eigen-decomposition without symmetrization could be very powerful

slide-87
SLIDE 87

28/28

Conclusions

Eigen-decomposition without symmetrization could be very powerful

  • effective under minimal eigenvalue separation
  • distribution-free
  • adaptive to heteroscedastic noise
  • enables “fine-grained” inference
  • statistically optimal
  • C. Cheng, Y. Wei, Y. Chen, “Inference for linear forms of eigenvectors under minimal eigenvalue separation:

Asymmetry and heteroscedasticity”, arXiv:2001.04620, 2020

  • Y. Chen, C. Cheng, J. Fan, “Asymmetry helps: Eigenvalue and eigenvector analyses of asymmetrically perturbed

low-rank matrices”, arXiv:1811.12804, 2018 (alphabetical order) — accepted to Annals of Statistics, 2020