Guaranteed Rank Minimization via Singular Value Projections - - PowerPoint PPT Presentation

guaranteed rank minimization via singular value
SMART_READER_LITE
LIVE PREVIEW

Guaranteed Rank Minimization via Singular Value Projections - - PowerPoint PPT Presentation

Guaranteed Rank Minimization via Singular Value Projections Inderjit S. Dhillon University of Texas at Austin Workshop on Algorithms for Massive Data Processing IIT Kanpur Dec 18, 2009 Joint work with Raghu Meka, Prateek Jain Inderjit S.


slide-1
SLIDE 1

Guaranteed Rank Minimization via Singular Value Projections

Inderjit S. Dhillon University of Texas at Austin Workshop on Algorithms for Massive Data Processing IIT Kanpur Dec 18, 2009

Joint work with Raghu Meka, Prateek Jain

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-2
SLIDE 2

Overview

Affine Constrained Rank Minimization Problem (ARMP) Singular Value Projection algorithm (SVP) Analysis Matrix Completion Results Conclusions

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-3
SLIDE 3

Rank Minimization Problem(RMP)

(RMP) : min rank(X) s.t X ∈ C. C is a convex set, e.g., a polyhedral set Applications:

Machine Learning Computer Vision Control Theory

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-4
SLIDE 4

Affine Constrained Rank Minimization Problem (ARMP)

(ARMP) : min

X

rank(X) s.t A(X) = b. X ∈ Rm×n, A : Rm×n → Rd, b ∈ Rd. d ≪ mn Applications:

Matrix completion: Netflix Challenge Linear time-invariant systems Embedding using missing Euclidean distances

NP-hard even to approximate within log factor (Meka et al.’08)

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-5
SLIDE 5

An Example: Minimum Rank Matrix Completion

Netflix Challenge:

Given a few user-movie ratings Goal: complete ratings matrix

Small number of latent factors ≡ low-rank Special case of ARMP: (MCP) : min

X

rank(X) s.t tr(XejeT

i ) = bij, ∀(i, j) ∈ Ω.

Typically, number of samples very small: Netflix has 1% samples

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-6
SLIDE 6

Existing Work

Various heuristics like alternative minimization, log-det relaxation Typically no theoretical guarantees Recent work: theoretical guarantees from generalizations of compressed sensing

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-7
SLIDE 7

ARMP: Generalization of Compressed Sensing (CS)

(CS) : min

x

x0 s.t Ax = b. x ∈ Rn, A : Rn → Rd, b ∈ Rd. d ≪ n (typically, d = s log n) Specific instance of ARMP with X = Diag(x). Technique CS ARMP Convex relaxation ℓ1 (Lasso) Trace-norm (SVT) Greedy approach MP, OMP, CoSamp ADMiRA Hard Thresholding IHT, GradeS SVP, IHT

Table: CS vs ARMP

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-8
SLIDE 8

Restricted Isometry Property (RIP)

Most CS methods assume RIP (1 − δs)x2 ≤ Ax2 ≤ (1 + δs)x2, ∀x s.t. x0 ≤ s Generalization to matrices: (1 − δk)X2

F ≤ A(X)2 2 ≤ (1 + δk)X2 F, ∀X s.t. rank(X) ≤ k

Families satisfying RIP: A(X) = A vec(X),

Aij ∼ N(0, 1/d) Aij =

  • 1/

√ d with probability 1/2 −1/ √ d with probability 1/2

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-9
SLIDE 9

Singular Value Projection (SVP)

(RARMP) : min

X

ψ(X) = 1 2A(X) − b2

2,

s.t X ∈ C(k) = {X : rank(X) ≤ k}. Adapt classical projected gradient Efficient projection onto non-convex rank constraint

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-10
SLIDE 10

Singular Value Projection (SVP)

Algorithm 1 SVP algorithm Initialize X 0 = 0, t = 0 Set step size ηt repeat X t+1 = Pk(X t − ηt AT(A(X t) − b)

  • ∇ψ(X)

) t = t + 1 until Convergence Pk(X) = UkΣkV T

k —top k singular vectors, best rank k approximation

X t: low-rank, stored using (m + n)k values

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-11
SLIDE 11

SVP: Main result

Theorem Isometry constant: δ2k < 1/3 Exact case: b = A(X ∗) Set ηt = 1/(1 + δ2k) SVP outputs matrix X of rank k s.t. A(X) − b2

2 ≤ ǫ

Maximum number of iterations:

  • C log b2

  • Geometric convergence

For δ2k = 1/5, ηt = 5/6, number of iterations:

  • log2

b2 2ǫ

  • Inderjit S. Dhillon University of Texas at Austin

Guaranteed Rank Minimization via Singular Value Projections

slide-12
SLIDE 12

SVP: Guarantees—Noisy Case

Theorem Isometry constant δ2k ≤ 1/3 Noisy case: b = A(X ∗) + e (e is error vector) Set ηt = 1/(1 + δ2k) SVP outputs X of rank k s.t., A(X) − b2

2 ≤ (C 2 + ǫ)e2, ǫ ≥ 0

Number of iterations is bounded by:

  • D log

b2 (C 2 + ǫ)e2

  • Geometric convergence to C-approx solution

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-13
SLIDE 13

SVP: Proof

Simple analysis–apply RIP twice and Eckart-Young theorem once ψ(X) = 1

2A(X) − b2 2: a quadratic function, ψ(X t+1) − ψ(X t) = ∇ψ(X t), X t+1 − X t + 1 2 A(

Rank2k

z }| { X t+1 − X t)2

2 Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-14
SLIDE 14

SVP: Proof

Simple analysis–apply RIP twice and Eckart-Young theorem once ψ(X) = 1

2A(X) − b2 2: a quadratic function, ψ(X t+1) − ψ(X t) = ∇ψ(X t), X t+1 − X t + 1 2 A(

Rank2k

z }| { X t+1 − X t)2

2

≤ ∇ψ(X t), X t+1 − X t + 1 2 (1 + δ2k)X t+1 − X t2

F

| {z }

Using RIP

,

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-15
SLIDE 15

SVP: Proof

Simple analysis–apply RIP twice and Eckart-Young theorem once ψ(X) = 1

2A(X) − b2 2: a quadratic function, ψ(X t+1) − ψ(X t) = ∇ψ(X t), X t+1 − X t + 1 2 A(

Rank2k

z }| { X t+1 − X t)2

2

≤ ∇ψ(X t), X t+1 − X t + 1 2 (1 + δ2k)X t+1 − X t2

F

| {z }

Using RIP

, = 1 2 (1 + δ2k)X t+1 − Y t+12

F −

1 2(1 + δ2k) AT (A(X t) − b)2

F Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-16
SLIDE 16

SVP: Proof

Simple analysis–apply RIP twice and Eckart-Young theorem once ψ(X) = 1

2A(X) − b2 2: a quadratic function, ψ(X t+1) − ψ(X t) = ∇ψ(X t), X t+1 − X t + 1 2 A(

Rank2k

z }| { X t+1 − X t)2

2

≤ ∇ψ(X t), X t+1 − X t + 1 2 (1 + δ2k)X t+1 − X t2

F

| {z }

Using RIP

, = 1 2 (1 + δ2k)X t+1 − Y t+12

F −

1 2(1 + δ2k) AT (A(X t) − b)2

F

whereY t+1 = X t − 1 1 + δ2k ∇ψ(X t), X t+1 = Pk(Y t+1)

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-17
SLIDE 17

SVP: Proof

Simple analysis–apply RIP twice and Eckart-Young theorem once ψ(X) = 1

2A(X) − b2 2: a quadratic function, ψ(X t+1) − ψ(X t) = ∇ψ(X t), X t+1 − X t + 1 2 A(

Rank2k

z }| { X t+1 − X t)2

2

≤ ∇ψ(X t), X t+1 − X t + 1 2 (1 + δ2k)X t+1 − X t2

F

| {z }

Using RIP

, = 1 2 (1 + δ2k)X t+1 − Y t+12

F −

1 2(1 + δ2k) AT (A(X t) − b)2

F

≤ 1 2 (1 + δ2k) X ∗ − Y t+12

F

| {z }

Eckart−YoungTheorem

− 1 2(1 + δ2k) AT (A(X t) − b)2

F Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-18
SLIDE 18

SVP: Proof

Simple analysis–apply RIP twice and Eckart-Young theorem once ψ(X) = 1

2A(X) − b2 2: a quadratic function, ψ(X t+1) − ψ(X t) ≤ 1 2 (1 + δ2k) X ∗ − Y t+12

F

| {z }

Eckart−YoungTheorem

− 1 2(1 + δ2k) AT (A(X t) − b)2

F

= ∇ψ(X t), X ∗ − X t + 1 2 (1 + δ2k)X ∗ − X t2

F Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-19
SLIDE 19

SVP: Proof

Simple analysis–apply RIP twice and Eckart-Young theorem once ψ(X) = 1

2A(X) − b2 2: a quadratic function, ψ(X t+1) − ψ(X t) ≤ 1 2 (1 + δ2k) X ∗ − Y t+12

F

| {z }

Eckart−YoungTheorem

− 1 2(1 + δ2k) AT (A(X t) − b)2

F

= ∇ψ(X t), X ∗ − X t + 1 2 (1 + δ2k)X ∗ − X t2

F

≤ ∇ψ(X t), X ∗ − X t + 1 2 1 + δ2k 1 − δ2k A(X ∗ − X t)2

2

| {z }

Using RIP Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-20
SLIDE 20

SVP: Proof

Simple analysis–apply RIP twice and Eckart-Young theorem once ψ(X) = 1

2A(X) − b2 2: a quadratic function, ψ(X t+1) − ψ(X t) ≤ 1 2 (1 + δ2k) X ∗ − Y t+12

F

| {z }

Eckart−YoungTheorem

− 1 2(1 + δ2k) AT (A(X t) − b)2

F

= ∇ψ(X t), X ∗ − X t + 1 2 (1 + δ2k)X ∗ − X t2

F

≤ ∇ψ(X t), X ∗ − X t + 1 2 1 + δ2k 1 − δ2k A(X ∗ − X t)2

2

| {z }

Using RIP

= ψ(X ∗) − ψ(X t) + δ2k (1 − δ2k) A(X ∗ − X t)2

2, Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-21
SLIDE 21

SVP: Proof

Simple analysis–apply RIP twice and Eckart-Young theorem once ψ(X) = 1

2A(X) − b2 2: a quadratic function, ψ(X t+1) − ψ(X t) ≤ 1 2 (1 + δ2k) X ∗ − Y t+12

F

| {z }

Eckart−YoungTheorem

− 1 2(1 + δ2k) AT (A(X t) − b)2

F

= ∇ψ(X t), X ∗ − X t + 1 2 (1 + δ2k)X ∗ − X t2

F

≤ ∇ψ(X t), X ∗ − X t + 1 2 1 + δ2k 1 − δ2k A(X ∗ − X t)2

2

| {z }

Using RIP

= ψ(X ∗) − ψ(X t) + δ2k (1 − δ2k) A(X ∗ − X t)2

2,

For exact case, ψ(X ∗) = 0, A(X ∗) = b. Hence, ψ(X t+1) ≤ 2δ2k (1 − δ2k) | {z }

<1 for δ2k <1/3

ψ(X t).

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-22
SLIDE 22

Comparison to Existing Methods

Method Generalization of RIP constant Rate of Convergence Noisy Measurements Trace-norm (RFP07) l1 relaxation δ5k < 1/10 Not known No Trace-norm (LB09b) l1 relaxation δ3k < 1/4 √ 3 Not known Yes ADMiRA (LB09a) Matching Pursuit δ4k < 1/ √ 32 Geometric Yes SVP IHT δ2k ≤ 1/3 Geometric Yes

Table: Comparison of the existing approaches with SVP

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-23
SLIDE 23

Matrix Completion

Complete a low-rank matrix from few sampled entries Minimum rank matrix completion problem: (MCP) : min

X

rank(X), s.t PΩ(X) = PΩ(X ∗). PΩ : Rm×n → Rm×n—projection onto index set Ω, i.e., (PΩ(X))ij =

  • Xij

for (i, j) ∈ Ω

  • therwise

Special case of ARMP: SVP can be applied directly Problem: MCP does not satisfy RIP in general

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-24
SLIDE 24

Existing Work: Matrix Completion

Most ARMP methods applicable to MCP Exact recovery:

Trace-norm relaxation: Recht and Candes’08, Candes and Tao’09 SVD+Alternative Minimization: Keshavan et al.’09

Assumptions: uniform sampling, incoherence Definition (Incoherence) X ∈ Rm×n with SVD X = UΣV T is µ-incoherent if max

i,j |Uij| ≤

√µ √m, max

i,j |Vij| ≤

√µ √n .

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-25
SLIDE 25

SVP: Matrix Completion

(RMCP) : min

X

ψ(X) = 1 2PΩ(X − X ∗)2

F,

s.t X ∈ C(k) = {X : rank(X) ≤ k}. Algorithm 2 SVP for Matrix Completion Initialize X 0 = 0, t = 0 Set step size ηt = 1/(1 + δ)p, p=sampling density, δ is a parameter repeat X t+1 = Pk(X t − ηtPΩ(X t − X ∗)) t = t + 1 until Convergence Pk(X) = UkΣkV T

k : top k singular vectors of X

Computation of k singular vectors of: X t

  • low rank

−ηt PΩ(X t − X ∗)

  • sparse

Matrix-vector multiplication: O((m + n)k + |Ω|)

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-26
SLIDE 26

Weak RIP

We show the following weak RIP: Theorem (Weak RIP) Let 0 < δ < 1. Sampling density p ≥ Cµ2k2 log n/δ2m. For all rank k, µ-incoherent matrices X, (1 − δ)p X2

F ≤ PΩ(X)2 F ≤ (1 + δ)p X2 F,

with high probability. Similar to RIP but only for incoherent matrices If every iterate is incoherent = ⇒ SVP optimal

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-27
SLIDE 27

Matrix Completion: Convergence Proof?

SVP converges if: Conjecture (Incoherence) Let X and X ∗ be rank k, µ-incoherent matrices. Set ηt < 1. Y = Pk(X − ηtPΩ(X − X ∗)) is (1 + ǫ)µ-incoherent for small ǫ.

Figure: Empirical estimates of incoherence and sampling density threshold (matches Ck log n/n, C = 1.28)

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-28
SLIDE 28

Results: ARMP

Synthetic Datasets:

Generate a random X ∗ of rank k Generate Ai’s randomly, bi = tr(AiX ∗)

MIT Logo

X ∗ obtained using MIT Logo image of size 38 × 73 and rank 4 Generate Ai’s randomly, bi = tr(AiX ∗) Figure: MIT Logo

Compare against an adaptation of SVT (trace-norm relaxation)

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-29
SLIDE 29

Results: ARMP

(a) (b)

Figure: (a): Time taken by SVP and SVT for random instances optimal rank k = 5, (b): Error for MIT logo

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-30
SLIDE 30

Results: Matrix Completion

Synthetic Datasets:

Generate a random X ∗ of rank k Generate Ω uniformly with sampling density p

MovieLens Dataset:

User-movie ratings matrix 1 million ratings for 3900 movies by 6040 users

Compare against:

SVT (Cai et al.’08) SMC (Keshavan et al.’09) ADMiRA (Lee and Bresler’09) Alternative least squares (ALS): our implementation

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-31
SLIDE 31

Results: Matrix Completion for Synthetic Datasets

Figure: Running time (log scale) for different sizes and ranks

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-32
SLIDE 32

Results: Matrix Completion for Noisy Synthetic Datasets

Figure: Noise level: 10% corrupt samples

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-33
SLIDE 33

Results: Matrix Completion for MovieLens Dataset

Method RMSE Time SVP 1.01 64.85 SVT 1.21 1214.78 ALS 0.90 195.34

Table: RMSE obtained and Time taken by various methods

Problem: Ratings matrix is not sampled uniformly

Figure: Cumulative degree distribution of users (MoviesLens)

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections

slide-34
SLIDE 34

Conclusions and Future Work

Singular Value Projection (SVP) algorithm

Simple analysis for ARMP (with RIP) Partial progress for matrix completion SVP much faster than existing methods

Future Work

Optimality of SVP for matrix completion Other sampling distributions: power-law distributions Hard thresholding algorithms for other problems, e.g., sparse+low-rank matrix decomposition

Paper available at: http://arxiv.org/abs/0909.5457 Code available at: http://www.cs.utexas.edu/users/pjain/svp/

Inderjit S. Dhillon University of Texas at Austin Guaranteed Rank Minimization via Singular Value Projections