Asymmetry Helps: Eigenvalue and Eigenvector Analyses of - - PowerPoint PPT Presentation

asymmetry helps eigenvalue and eigenvector analyses of
SMART_READER_LITE
LIVE PREVIEW

Asymmetry Helps: Eigenvalue and Eigenvector Analyses of - - PowerPoint PPT Presentation

Asymmetry Helps: Eigenvalue and Eigenvector Analyses of Asymmetrically Perturbed Low-Rank Matrices Yuxin Chen Electrical Engineering, Princeton University Jianqing Fan Chen Cheng Princeton ORFE PKU Math Eigenvalue / eigenvector estimation M


slide-1
SLIDE 1

Asymmetry Helps: Eigenvalue and Eigenvector Analyses

  • f Asymmetrically Perturbed Low-Rank Matrices

Yuxin Chen Electrical Engineering, Princeton University

slide-2
SLIDE 2

Chen Cheng PKU Math Jianqing Fan Princeton ORFE

slide-3
SLIDE 3

Eigenvalue / eigenvector estimation

M ⋆: truth

  • A rank-1 matrix: M ⋆ = λ⋆u⋆u⋆⊤ ∈ Rn×n

3/ 21

slide-4
SLIDE 4

Eigenvalue / eigenvector estimation

M ⋆: truth

+

H: noise

  • A rank-1 matrix: M ⋆ = λ⋆u⋆u⋆⊤ ∈ Rn×n
  • Observed noisy data: M = M ⋆ + H

3/ 21

slide-5
SLIDE 5

Eigenvalue / eigenvector estimation

M ⋆: truth

+

H: noise

  • A rank-1 matrix: M ⋆ = λ⋆u⋆u⋆⊤ ∈ Rn×n
  • Observed noisy data: M = M ⋆ + H
  • Goal: estimate eigenvalue λ⋆ and eigenvector u⋆

3/ 21

slide-6
SLIDE 6

Non-symmetric noise matrix

M =

M ⋆ = λ⋆u⋆u⋆⊤

+

H: asymmetric matrix This may arise when, e.g., we have 2 samples for each entry of M ⋆ and arrange them in an asymmetric manner

4/ 21

slide-7
SLIDE 7

A natural estimation strategy: SVD

M =

M ⋆ = λ⋆u⋆u⋆⊤

+

H: asymmetric matrix

  • Use leading singular value λsvd of M to estimate λ⋆
  • Use leading left singular vector of M to estimate u⋆

5/ 21

slide-8
SLIDE 8

A less popular strategy: eigen-decomposition

M =

M ⋆ = λ⋆u⋆u⋆⊤

+

H: asymmetric matrix

  • Use leading singular value λsvd eigenvalue λeigs of M to estimate λ⋆
  • Use leading singular vector eigenvector of M to estimate u⋆

6/ 21

slide-9
SLIDE 9

SVD vs. eigen-decomposition

For asymmetric matrices:

  • Numerical stability

SVD > eigen-decomposition

7/ 21

slide-10
SLIDE 10

SVD vs. eigen-decomposition

For asymmetric matrices:

  • Numerical stability

SVD > eigen-decomposition

  • (Folklore?) Statistical accuracy

SVD ≍ eigen-decomposition

7/ 21

slide-11
SLIDE 11

SVD vs. eigen-decomposition

For asymmetric matrices:

  • Numerical stability

SVD > eigen-decomposition

  • (Folklore?) Statistical accuracy

SVD ≍ eigen-decomposition Shall we always prefer SVD over eigen-decomposition?

7/ 21

slide-12
SLIDE 12

A curious numerical experiment: Gaussian noise

M = u⋆u⋆⊤

M⋆

+ H; {Hi,j} : i.i.d. N(0, σ2), σ =

1 √n log n

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

  • λsvd − λ?
  • 8/ 21
slide-13
SLIDE 13

A curious numerical experiment: Gaussian noise

M = u⋆u⋆⊤

M⋆

+ H; {Hi,j} : i.i.d. N(0, σ2), σ =

1 √n log n

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD

  • λsvd − λ?
  • .5

n

  • λeigs − λ?
  • 8/ 21
slide-14
SLIDE 14

A curious numerical experiment: Gaussian noise

M = u⋆u⋆⊤

M⋆

+ H; {Hi,j} : i.i.d. N(0, σ2), σ =

1 √n log n

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD Rescaled SVD Error 2.5 pn

  • λsvd − λ?
  • λsvd − λ?
  • .5

n

  • λeigs − λ?
  • 8/ 21
slide-15
SLIDE 15

A curious numerical experiment: Gaussian noise

M = u⋆u⋆⊤

M⋆

+ H; {Hi,j} : i.i.d. N(0, σ2), σ =

1 √n log n

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD Rescaled SVD Error 2.5 pn

  • λsvd − λ?
  • λsvd − λ?
  • .5

n

  • λeigs − λ?
  • empirically,
  • λeigs − λ⋆

≈ 2.5

√n

  • λsvd − λ⋆
  • 8/ 21
slide-16
SLIDE 16

Another numerical experiment: matrix completion

M ⋆ = u⋆u⋆⊤; Mi,j =

1

pM⋆ i,j

with prob. p, 0, else, p = 3 log n

n

        

  • ?

? ?

  • ?

? ?

  • ?

?

  • ?

?

  • ?

? ? ?

  • ?

?

  • ?

? ? ? ? ?

  • ?

?

  • ?

        

n j6 ! 6?j

9/ 21

slide-17
SLIDE 17

Another numerical experiment: matrix completion

M ⋆ = u⋆u⋆⊤; Mi,j =

1

pM⋆ i,j

with prob. p, 0, else, p = 3 log n

n

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD Rescaled SVD Error 2.5 pn

  • λsvd − λ?
  • λsvd − λ?
  • .5

n

  • λeigs − λ?
  • empirically,
  • λeigs − λ⋆

≈ 2.5

√n

  • λsvd − λ⋆
  • 9/ 21
slide-18
SLIDE 18

Why does eigen-decomposition work so much better than SVD?

slide-19
SLIDE 19

Problem setup

M = u⋆u⋆⊤

M⋆

+ H ∈ Rn×n

  • H: noise matrix
  • independent entries: {Hi,j} are independent
  • zero mean: E[Hi,j] = 0
  • variance: Var(Hi,j) ≤ σ2
  • magnitudes: P{|Hi,j| ≥ B} n−12

11/ 21

slide-20
SLIDE 20

Problem setup

M = u⋆u⋆⊤

M⋆

+ H ∈ Rn×n

  • H: noise matrix
  • independent entries: {Hi,j} are independent
  • zero mean: E[Hi,j] = 0
  • variance: Var(Hi,j) ≤ σ2
  • magnitudes: P{|Hi,j| ≥ B} n−12
  • M ⋆ obeys incoherence condition

max

1≤i≤n

  • e⊤

i u⋆

≤ µ

n 4 ei U ? ke>

i U ?k2

11/ 21

slide-21
SLIDE 21

Classical linear algebra results

  • λsvd − λ⋆

≤ H

(Weyl)

  • λeigs − λ⋆

≤ H

(Bauer-Fike)

12/ 21

slide-22
SLIDE 22

Classical linear algebra results

  • λsvd − λ⋆

≤ H

(Weyl)

  • λeigs − λ⋆

≤ H

(Bauer-Fike)

matrix Bernstein inequality

  • λsvd − λ⋆

σ

  • n log n + B log n
  • λeigs − λ⋆

σ

  • n log n + B log n

12/ 21

slide-23
SLIDE 23

Classical linear algebra results

  • λsvd − λ⋆

≤ H

(Weyl)

  • λeigs − λ⋆

≤ H

(Bauer-Fike)

matrix Bernstein inequality

  • λsvd − λ⋆

σ

  • n log n + B log n

(reasonably tight if H is large)

  • λeigs − λ⋆

σ

  • n log n + B log n

12/ 21

slide-24
SLIDE 24

Classical linear algebra results

  • λsvd − λ⋆

≤ H

(Weyl)

  • λeigs − λ⋆

≤ H

(Bauer-Fike)

matrix Bernstein inequality

  • λsvd − λ⋆

σ

  • n log n + B log n

(reasonably tight if H is large)

  • λeigs − λ⋆

σ

  • n log n + B log n

(can be significantly improved)

12/ 21

slide-25
SLIDE 25

Main results: eigenvalue perturbation

Theorem 1 (Chen, Cheng, Fan ’18) With high prob., leading eigenvalue λeigs of M obeys

  • λeigs − λ⋆

µ

n

σ

  • n log n + B log n
  • n

j6 ! 6?j n j6 ! 6?j n j6 ! 6?j

13/ 21

slide-26
SLIDE 26

Main results: eigenvalue perturbation

Theorem 1 (Chen, Cheng, Fan ’18) With high prob., leading eigenvalue λeigs of M obeys

  • λeigs − λ⋆

µ

n

σ

  • n log n + B log n
  • 200

400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j SVD

200 400 600 800 1000 1200 1400 1600 1800 2000

n

10-2 10-1 100

j6 ! 6?j Eigen-Decomposition SVD Rescaled SVD Error 2.5 pn

  • λsvd − λ?
  • λsvd − λ?
  • .5

n

  • λeigs − λ?
  • Eigen-decomposition is

n

µ times better than SVD!

— recall

  • λsvd − λ⋆

σ√n log n + B log n

13/ 21

slide-27
SLIDE 27

Main results: entrywise eigenvector perturbation

Theorem 2 (Chen, Cheng, Fan ’18) With high prob., leading eigenvector u of M obeys min

  • u ± u⋆

µ

n

σ

  • n log n + B log n
  • n

min fku ! u?k1 ; ku + u?k1g

14/ 21

slide-28
SLIDE 28

Main results: entrywise eigenvector perturbation

Theorem 2 (Chen, Cheng, Fan ’18) With high prob., leading eigenvector u of M obeys min

  • u ± u⋆

µ

n

σ

  • n log n + B log n
  • if H ≪
  • λ⋆

, then

min

  • u ± u⋆
  • 2 ≪
  • u⋆
  • 2

(classical bound)

n min fku ! u?k1 ; ku + u?k1g

14/ 21

slide-29
SLIDE 29

Main results: entrywise eigenvector perturbation

Theorem 2 (Chen, Cheng, Fan ’18) With high prob., leading eigenvector u of M obeys min

  • u ± u⋆

µ

n

σ

  • n log n + B log n
  • if H ≪
  • λ⋆

, then

min

  • u ± u⋆
  • 2 ≪
  • u⋆
  • 2

(classical bound) min

  • u ± u⋆
  • ∞ ≪
  • u⋆

(our bound)

n min fku ! u?k1 ; ku + u?k1g

14/ 21

slide-30
SLIDE 30

Main results: entrywise eigenvector perturbation

Theorem 2 (Chen, Cheng, Fan ’18) With high prob., leading eigenvector u of M obeys min

  • u ± u⋆

µ

n

σ

  • n log n + B log n
  • if H ≪
  • λ⋆

, then

min

  • u ± u⋆
  • 2 ≪
  • u⋆
  • 2

(classical bound) min

  • u ± u⋆
  • ∞ ≪
  • u⋆

(our bound)

  • entrywise eigenvector perturbation is well-controlled

n min fku ! u?k1 ; ku + u?k1g

14/ 21

slide-31
SLIDE 31

Main results: entrywise eigenvector perturbation

Theorem 2 (Chen, Cheng, Fan ’18) With high prob., leading eigenvector u of M obeys min

  • u ± u⋆

µ

n

σ

  • n log n + B log n
  • 200

400 600 800 1000 1200 1400 1600 1800 2000

n

0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18

min fku ! u?k1 ; ku + u?k1g Eigen-Decomposition SVD

{Hi,j} : i.i.d. N(0, σ2); σ2 =

1 n log n

14/ 21

slide-32
SLIDE 32

Main results: perturbation of linear forms of eigenvectors

Theorem 3 (Chen, Cheng, Fan ’18) Fix any unit vector a. With high prob., leading eigenvector u of M

  • beys

min

  • a⊤(u ± u⋆)
  • max
  • a⊤u⋆

, µ

n

σ

  • n log n + B log n
  • 15/ 21
slide-33
SLIDE 33

Main results: perturbation of linear forms of eigenvectors

Theorem 3 (Chen, Cheng, Fan ’18) Fix any unit vector a. With high prob., leading eigenvector u of M

  • beys

min

  • a⊤(u ± u⋆)
  • max
  • a⊤u⋆

, µ

n

σ

  • n log n + B log n
  • if H ≪
  • λ⋆

, then

min

  • a⊤(u ± u⋆)
  • ≪ max
  • a⊤u⋆

, u⋆∞

  • perturbation of an arbitrary linear form of leading eigenvector is

well-controlled

15/ 21

slide-34
SLIDE 34

Intuition: asymmetry reduces bias

From Neumann series

  • some sort of Taylor expansion
  • ne can verify

|λ − λ⋆| ≍

  • u⋆⊤Hu⋆

λ + u⋆⊤H2u⋆ λ2 + u⋆⊤H3u⋆ λ3 + · · ·

  • 16/ 21
slide-35
SLIDE 35

Intuition: asymmetry reduces bias

From Neumann series

  • some sort of Taylor expansion
  • ne can verify

|λ − λ⋆| ≍

  • u⋆⊤Hu⋆

λ + u⋆⊤H2u⋆ λ2 + u⋆⊤H3u⋆ λ3 + · · ·

  • To develop some intuition, let’s look at 2nd order term

16/ 21

slide-36
SLIDE 36

Intuition: asymmetry reduces bias

From Neumann series

  • some sort of Taylor expansion
  • ne can verify

|λ − λ⋆| ≍

  • u⋆⊤Hu⋆

λ + u⋆⊤H2u⋆ λ2 + u⋆⊤H3u⋆ λ3 + · · ·

  • To develop some intuition, let’s look at 2nd order term
  • if H is symmetric,

E

u⋆⊤H2u⋆ = E Hu⋆2

2

= nσ2

16/ 21

slide-37
SLIDE 37

Intuition: asymmetry reduces bias

From Neumann series

  • some sort of Taylor expansion
  • ne can verify

|λ − λ⋆| ≍

  • u⋆⊤Hu⋆

λ + u⋆⊤H2u⋆ λ2 + u⋆⊤H3u⋆ λ3 + · · ·

  • To develop some intuition, let’s look at 2nd order term
  • if H is symmetric,

E

u⋆⊤H2u⋆ = E Hu⋆2

2

= nσ2

  • if H is asymmetric,

E

u⋆⊤H2u⋆ = E H⊤u⋆, Hu⋆ = σ2

  • much smaller than symmetric case

16/ 21

slide-38
SLIDE 38

What happens if M ⋆ is also not symmetric?

  • A rank-1 matrix: M ⋆ = λ⋆u⋆v⋆⊤ ∈ Rn1×n2
  • Suppose we observe 2 independent noisy copies

M1 = M ⋆ + H1, M2 = M ⋆ + H2

  • Goal: estimate λ⋆, u⋆ and v⋆

17/ 21

slide-39
SLIDE 39

Asymmetrization + dilation

Compute leading eigenvalue / eigenvector of

  • M1

M ⊤

2

  • =
  • M ⋆ + H1

M ⋆⊤ + H⊤

2

  • Our findings (eigenvalue / eigenvector perturbation) continue to

hold for this case!

18/ 21

slide-40
SLIDE 40

Rank-r case

M ⋆: truth

  • A rank-r and well-conditioned matrix: M ⋆ = r

i=1 λ⋆ i u⋆ i u⋆⊤ i

  • Observed noisy data: M = M ⋆ + H, where {Hi,j} are

independent

  • Goal: estimate λ⋆

19/ 21

slide-41
SLIDE 41

Rank-r case

M ⋆: truth

+

H: noise

  • A rank-r and well-conditioned matrix: M ⋆ = r

i=1 λ⋆ i u⋆ i u⋆⊤ i

  • Observed noisy data: M = M ⋆ + H, where {Hi,j} are

independent

  • Goal: estimate λ⋆

19/ 21

slide-42
SLIDE 42

Eigenvalue perturbation: rank-r case

Theorem 4 (Chen, Cheng, Fan ’18) With high prob., ith largest eigenvalue λi (1 ≤ i ≤ r) of M obeys

  • λi − λ⋆

j

  • µr2

n

σ

  • n log n + B log n
  • for some 1 ≤ j ≤ r

20/ 21

slide-43
SLIDE 43

Eigenvalue perturbation: rank-r case

Theorem 4 (Chen, Cheng, Fan ’18) With high prob., ith largest eigenvalue λi (1 ≤ i ≤ r) of M obeys

  • λi − λ⋆

j

  • µr2

n

σ

  • n log n + B log n
  • for some 1 ≤ j ≤ r
  • Eigen-decomposition is

n

µr2 times better than SVD!

20/ 21

slide-44
SLIDE 44

Eigenvalue perturbation: rank-r case

Theorem 4 (Chen, Cheng, Fan ’18) With high prob., ith largest eigenvalue λi (1 ≤ i ≤ r) of M obeys

  • λi − λ⋆

j

  • µr2

n

σ

  • n log n + B log n
  • for some 1 ≤ j ≤ r
  • Eigen-decomposition is

n

µr2 times better than SVD!

  • Might be improvable to
  • µr

n

σ√n log n + B log n ?

20/ 21

slide-45
SLIDE 45

Concluding remarks

Eigen-decomposition could be much more powerful than SVD when dealing with non-symmetric data matrices

21/ 21

slide-46
SLIDE 46

Concluding remarks

Eigen-decomposition could be much more powerful than SVD when dealing with non-symmetric data matrices Future directions:

  • Eigenvector perturbation for rank-r case
  • Beyond i.i.d. noise
  • Y. Chen, C. Cheng, J. Fan, “Asymmetry helps: Eigenvalue and eigenvector analyses of

asymmetrically perturbed low-rank matrices”, arXiv:1811.12804, 2018

21/ 21