Matrix Completion from a Few Entries Raghunandan Keshavan, Andrea - - PowerPoint PPT Presentation

matrix completion from a few entries
SMART_READER_LITE
LIVE PREVIEW

Matrix Completion from a Few Entries Raghunandan Keshavan, Andrea - - PowerPoint PPT Presentation

Matrix Completion from a Few Entries Raghunandan Keshavan, Andrea Montanari and Sewoong Oh Stanford University Physics of Algorithms Santa Fe - Aug 31, 2009 R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 1 /


slide-1
SLIDE 1

Matrix Completion from a Few Entries

Raghunandan Keshavan, Andrea Montanari and Sewoong Oh

Stanford University

Physics of Algorithms Santa Fe - Aug 31, 2009

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 1 / 24

slide-2
SLIDE 2

Motivating Example : Recommender System

Netflix Challenge

3 1 3 4 1 1 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 1 1 1 1 1 5 5 5 2 2 2 2

5 · 105 users 2 · 104 movies 108 ratings M =

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 2 / 24

slide-3
SLIDE 3

The Model

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 3 / 24

slide-4
SLIDE 4

Matrix Completion Problem

Low-rank M U Σ V T

=

n nα r r

  • 1. Low-rank matrix M = UΣV T.

UTU = n, V TV = nα, Σ = diag(Σ1, Σ2, . . . , Σr)

  • 2. Uniformly random sample E ⊂ [n] × [nα] given its size |E|.

[k] = {1, 2, . . . , k} R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 4 / 24

slide-5
SLIDE 5

Matrix Completion Problem

Sample ME n nα

  • 1. Low-rank matrix M = UΣV T.

UTU = n, V TV = nα, Σ = diag(Σ1, Σ2, . . . , Σr)

  • 2. Uniformly random sample E ⊂ [n] × [nα] given its size |E|.

[k] = {1, 2, . . . , k} R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 4 / 24

slide-6
SLIDE 6

Matrix Completion Problems

For any estimation M, let RMSE =

  • 1

√mn||M −

M||F

  • ||A||2

F = P ij A2 ij

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24

slide-7
SLIDE 7

Matrix Completion Problems

For any estimation M, let RMSE =

  • 1

√mn||M −

M||F

  • Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn |E| = O(nr)

  • Q2. How many samples do we need to recover M exactly?

n log n |E| = O(n log n)

  • Q3. What if the samples are corrupted by noise?

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24

slide-8
SLIDE 8

Matrix Completion Problems

For any estimation M, let RMSE =

  • 1

√mn||M −

M||F

  • Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn |E| = O(nr)

  • Q2. How many samples do we need to recover M exactly?

n log n |E| = O(n log n)

  • Q3. What if the samples are corrupted by noise?

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24

slide-9
SLIDE 9

Matrix Completion Problems

  • Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn |E| = O(nr)

  • Q2. How many samples do we need to recover M exactly?

n log n |E| = O(n log n)

  • Q3. What if the samples are corrupted by noise?

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24

slide-10
SLIDE 10

Matrix Completion Problems

  • Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn |E| = O(nr)

  • Q2. How many samples do we need to recover M exactly?

n log n |E| = O(n log n)

  • Q3. What if the samples are corrupted by noise?

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24

slide-11
SLIDE 11

Matrix Completion Problems

  • Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn |E| = O(nr)

  • Q2. How many samples do we need to recover M exactly?

n log n |E| = O(n log n)

  • Q3. What if the samples are corrupted by noise?

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24

slide-12
SLIDE 12

Matrix Completion Problems

  • Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn |E| = O(nr)

  • Q2. How many samples do we need to recover M exactly?

n log n |E| = O(n log n)

  • Q3. What if the samples are corrupted by noise?

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24

slide-13
SLIDE 13

Pathological Example

M = e1eT

1

n

     

    1 · · · · · · . . . . . . ... . . . · · ·      P(observing M11) = |E|

n2

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 6 / 24

slide-14
SLIDE 14

Pathological Example

M = e1eT

1

n

     

    1 · · · · · · . . . . . . ... . . . · · ·      P(observing M11) = |E|

n2

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 6 / 24

slide-15
SLIDE 15

Incoherence Property

M is (µ0, µ1)-incoherent if A1. Mmax ≤ µ0Σ1 √r; , A2.

r

  • a=1

U2

ia ≤ µ1r , r

  • a=1

V 2

ja ≤ µ1r .

[Cand´ es, Recht 2008 [1]]

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 7 / 24

slide-16
SLIDE 16

Previous Work

Theorem (Cand´ es, Recht 2008 [1])

Let M be an n × nα matrix of rank r satisfying (µ0, µ1)-incoherence

  • condition. If

|E| ≥ C(α, µ0, µ1)rn6/5 log n , then w.h.p. Semidefinite Programming reconstructs M exactly.

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 8 / 24

slide-17
SLIDE 17

Main Contributions

Questions Main Results

  • 1. RMSE

≤ C(α)

  • nr

|E|

1

2

  • 2. Exact Reconstruction

|E| = O(n log n)

  • 3. Noisy Reconstruction

|E| = O(n log n) (N = M + Z)

1 √mn||M −

M||F ≤ C n√αr

|E| ||Z E||2

  • 4. Complexity?

O(|E|r log n)

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24

slide-18
SLIDE 18

Main Contributions

Questions Main Results

  • 1. RMSE

≤ C(α)

  • nr

|E|

1

2

  • 2. Exact Reconstruction

|E| = O(n log n)

  • 3. Noisy Reconstruction

|E| = O(n log n) (N = M + Z)

1 √mn||M −

M||F ≤ C n√αr

|E| ||Z E||2

  • 4. Complexity?

O(|E|r log n)

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24

slide-19
SLIDE 19

Main Contributions

Questions Main Results

  • 1. RMSE

≤ C(α)

  • nr

|E|

1

2

  • 2. Exact Reconstruction

|E| = O(n log n)

  • 3. Noisy Reconstruction

|E| = O(n log n) (N = M + Z)

1 √mn||M −

M||F ≤ C n√αr

|E| ||Z E||2

  • 4. Complexity?

O(|E|r log n)

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24

slide-20
SLIDE 20

Main Contributions

Questions Main Results

  • 1. RMSE

≤ C(α)

  • nr

|E|

1

2

  • 2. Exact Reconstruction

|E| = O(n log n)

  • 3. Noisy Reconstruction

|E| = O(n log n) (N = M + Z)

1 √mn||M −

M||F ≤ C n√αr

|E| ||Z E||2

  • 4. Complexity?

O(|E|r log n)

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24

slide-21
SLIDE 21

Main Contributions

Questions Main Results

  • 1. RMSE

≤ C(α)

  • nr

|E|

1

2

  • 2. Exact Reconstruction

|E| = O(n log n)

  • 3. Noisy Reconstruction

|E| = O(n log n) (N = M + Z)

1 √mn||M −

M||F ≤ C n√αr

|E| ||Z E||2

  • 4. Complexity?

O(|E|r log n)

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24

slide-22
SLIDE 22

The Algorithm and Main Theorems

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 10 / 24

slide-23
SLIDE 23

Na¨ ıve Approach

ME

ij =

Mij if (i, j) ∈ E ,

  • therwise.

ME =

n

  • k=1

xkσkyT

k

Rank-r projection : Pr(ME) ≡ n2α |E|

r

  • k=1

xkσkyT

k

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 11 / 24

slide-24
SLIDE 24

Na¨ ıve Approach Fails

Define : deg(rowi) ≡ # of samples in row i. For |E| = O(n), spurious singular values of Ω(

  • log n/(log log n)).

Solution : Trimming

  • ME

ij =

   if deg(rowi) > 2E[deg(rowi)] , if deg(colj) > 2E[deg(coli)] , ME

ij

  • therwise.

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 12 / 24

slide-25
SLIDE 25

Na¨ ıve Approach Fails

Define : deg(rowi) ≡ # of samples in row i. For |E| = O(n), spurious singular values of Ω(

  • log n/(log log n)).

Solution : Trimming

  • ME

ij =

   if deg(rowi) > 2E[deg(rowi)] , if deg(colj) > 2E[deg(coli)] , ME

ij

  • therwise.

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 12 / 24

slide-26
SLIDE 26

Na¨ ıve Approach Fails

Define : deg(rowi) ≡ # of samples in row i. For |E| = O(n), spurious singular values of Ω(

  • log n/(log log n)).

Solution : Trimming

  • ME

ij =

   if deg(rowi) > 2E[deg(rowi)] , if deg(colj) > 2E[deg(coli)] , ME

ij

  • therwise.

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 12 / 24

slide-27
SLIDE 27

The Algorithm

OptSpace Input : sample positions E, sample values ME, rank r Output : estimation M 1: Trim ME, and let ME be the output; 2: Compute rank-r projection Pr( ME) = X0S0Y T

0 ;

3:

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 13 / 24

slide-28
SLIDE 28

Main Result

Theorem (Keshavan, Montanari, Oh, 2009 [2])

Let M be an n × nα matrix of rank-r bounded by Mmax. Then, w.h.p, 1 nMmax ||M − Pr( ME)||F = RMSE ≤ C(α) nr |E| ,

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 14 / 24

slide-29
SLIDE 29

The Algorithm

OptSpace Input : sample positions E, sample values ME, rank r Output : estimation M 1: Trim ME, and let ME be the output; 2: Compute rank-r projection Pr( ME) = X0S0Y T

0 ;

3: Minimize F(X, Y ) by gradient descent starting at (X0, Y0). F(X, Y ) = min

S∈Rr×r

  • (i,j)∈E
  • Mij − (XSY T)ij

2

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 15 / 24

slide-30
SLIDE 30

Main Result

Theorem (Keshavan, Montanari, Oh, 2009 [2])

Assume r = O(1), and let M be an n × nα matrix satisfying (µ0, µ1)-incoherence with σ1(M)/σr(M) = O(1). If |E| ≥ C ′n log n , then OptSpace returns, w.h.p., the matrix M.

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 16 / 24

slide-31
SLIDE 31

Comparison

Theorem (Cand´ es, Tao, 2009 [3])

Assume a strongly incoherent matrix M. If |E| ≥ C r n (log n)6 then Semidefinite Programming returns, w.h.p., the matrix M.

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 17 / 24

slide-32
SLIDE 32

Corrupted Observations

N = M + Z for any n × nα matrix Z NE (defined similar to ME) input to OptSpace

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 18 / 24

slide-33
SLIDE 33

Corrupted Observations

N = M + Z for any n × nα matrix Z NE (defined similar to ME) input to OptSpace

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 18 / 24

slide-34
SLIDE 34

Corrupted Observations

N = M + Z for any n × nα matrix Z NE (defined similar to ME) input to OptSpace

Theorem (Keshavan, Montanari, Oh, 2009 [4])

Let N = M + Z with M as above and Z any n × nα matrix. If |E| ≥ C ′n log n , then OptSpace with input NE returns M such that w.h.p., 1 √mn||M − M||F ≤ C n√αr |E| ||Z E||2 provided that the right hand side is smaller than Σr.

||A||2 = supv=0 “ ||Av||

||v||

” R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 18 / 24

slide-35
SLIDE 35

Corrupted Observations

Zij are independent, E{Zij} = 0 and P{|Zij| ≥ x} ≤ Ce− x2

2σ2 R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 19 / 24

slide-36
SLIDE 36

Corrupted Observations

Zij are independent, E{Zij} = 0 and P{|Zij| ≥ x} ≤ Ce− x2

2σ2

Theorem (Keshavan, Montanari, Oh, 2009 [4])

If Z is a random matrix with entries drawn as above, then || Z E||2 ≤ Cσ √α|E| log |E| n 1

2 R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 19 / 24

slide-37
SLIDE 37

Corrupted Observations

Zij are independent, E{Zij} = 0 and P{|Zij| ≥ x} ≤ Ce− x2

2σ2

Theorem (Keshavan, Montanari, Oh, 2009 [4])

If Z is a random matrix with entries drawn as above, then || Z E||2 ≤ Cσ √α|E| log |E| n 1

2

Corollary

Let N = M + Z, with Z distributed as above. Then, OptSpace with input NE returns M such that 1 √mn||M − M||F ≤ Cασ nr |E| 1

2

log |E|

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 19 / 24

slide-38
SLIDE 38

Simulation Results

M = UV T, Uij ∼ N(0, 1), Vij ∼ N(0, 1) r = 10, α = 1, n = 1000 M is recovered if ||M − M||F/||M||F < 10−4

0.5 1 20 40 60 80 100 120 140 160 180 200 Lower Bound OptSpace FPCA SVT ADMiRA

|E|/n

Recovery Rate

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 20 / 24

slide-39
SLIDE 39

Simulation Results

Uij ∼ N(0, σ2 = 20/√n)[5] Zij ∼ N(0, 1) r = 2, α = 1, n = 600

0.2 0.4 0.6 0.8 1 100 200 300 400 500 600 Convex Relaxation ADMiRA OptSpace Oracle Bound

RMSE

|E|/n

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 21 / 24

slide-40
SLIDE 40

Conclusion

Main Results

1 RMSE ≤ δ :

|E| = O(nr)

2 Exact

: |E| = O(n log n)

3 Noisy

: ≤ C n√αr

|E| ||Z E||2

4 Complexity :

O(|E|r log n) What’s left?

1 Prior knowledge of rank?

RankEstimation is exact w.h.p for |E| = O(n)

2 r = Θ(nβ) ?

Suboptimal bound

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24

slide-41
SLIDE 41

Conclusion

Main Results

1 RMSE ≤ δ :

|E| = O(nr)

2 Exact

: |E| = O(n log n)

3 Noisy

: ≤ C n√αr

|E| ||Z E||2

4 Complexity :

O(|E|r log n) What’s left?

1 Prior knowledge of rank?

RankEstimation is exact w.h.p for |E| = O(n)

2 r = Θ(nβ) ?

Suboptimal bound

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24

slide-42
SLIDE 42

Conclusion

Main Results

1 RMSE ≤ δ :

|E| = O(nr)

2 Exact

: |E| = O(n log n)

3 Noisy

: ≤ C n√αr

|E| ||Z E||2

4 Complexity :

O(|E|r log n) What’s left?

1 Prior knowledge of rank?

RankEstimation is exact w.h.p for |E| = O(n)

2 r = Θ(nβ) ?

Suboptimal bound

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24

slide-43
SLIDE 43

Conclusion

Main Results

1 RMSE ≤ δ :

|E| = O(nr)

2 Exact

: |E| = O(n log n)

3 Noisy

: ≤ C n√αr

|E| ||Z E||2

4 Complexity :

O(|E|r log n) What’s left?

1 Prior knowledge of rank?

RankEstimation is exact w.h.p for |E| = O(n)

2 r = Θ(nβ) ?

Suboptimal bound

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24

slide-44
SLIDE 44

Conclusion

Main Results

1 RMSE ≤ δ :

|E| = O(nr)

2 Exact

: |E| = O(n log n)

3 Noisy

: ≤ C n√αr

|E| ||Z E||2

4 Complexity :

O(|E|r log n) What’s left?

1 Prior knowledge of rank?

RankEstimation is exact w.h.p for |E| = O(n)

2 r = Θ(nβ) ?

Suboptimal bound

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24

slide-45
SLIDE 45

Conclusion

Main Results

1 RMSE ≤ δ :

|E| = O(nr)

2 Exact

: |E| = O(n log n)

3 Noisy

: ≤ C n√αr

|E| ||Z E||2

4 Complexity :

O(|E|r log n) What’s left?

1 Prior knowledge of rank?

RankEstimation is exact w.h.p for |E| = O(n)

2 r = Θ(nβ) ?

Suboptimal bound Thank you!

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24

slide-46
SLIDE 46

References

  • E. J. Cand`

es and B. Recht. Exact matrix completion via convex optimization. arxiv:0805.4471, 2008.

  • R. H. Keshavan, A. Montanari, and S. Oh.

Matrix completion from a few entries. arXiv:0901.3150, January 2009.

  • E. J. Cand`

es and T. Tao. The power of convex relaxation: Near-optimal matrix completion. arXiv:0903.1476, 2009.

  • R. H. Keshavan, A. Montanari, and S. Oh.

Matrix completion from noisy entries. arXiv:0906.2027, June 2009.

  • E. J. Cand`

es and Y. Plan. Matrix completion with noise. arXiv:0903.3131, 2009.

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 23 / 24

slide-47
SLIDE 47

10 20 30 40 50 60 5 10 15 20 25 30 35 40 45 10 20 30 40 50 60 5 10 15 20 25 30 35 40 45

Untrimmed SVD Trimmed SVD

Σ1|E| n

c q

|E| n

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 24 / 24