Low-rank Matrix Completion via Convex Optimization Ben Recht - - PowerPoint PPT Presentation

low rank matrix completion via convex optimization
SMART_READER_LITE
LIVE PREVIEW

Low-rank Matrix Completion via Convex Optimization Ben Recht - - PowerPoint PPT Presentation

Low-rank Matrix Completion via Convex Optimization Ben Recht Center for the Mathematics of Information Caltech Recommender Systems Netflix Prize One million big ones! Given 100 million ratings on a scale of 1 to 5, predict 3


slide-1
SLIDE 1

Low-rank Matrix Completion via Convex Optimization

Ben Recht Center for the Mathematics of Information Caltech

slide-2
SLIDE 2

Recommender Systems

slide-3
SLIDE 3

Netflix Prize

  • One million big ones!
  • Given 100 million ratings on a scale of 1 to 5, predict 3

million ratings to highest accuracy

  • 17770 total movies x 480189 total users
  • Over 8 billion total ratings
  • How to fill in the blanks?
slide-4
SLIDE 4

Abstract Setup: Matrix Completion

  • How do you fill in the missing data?

Xij known for black cells Xij unknown for white cells Rows index movies Columns index users

X = X L R*

k x r r x n k x n kn entries

r(k+ n) entries =

slide-5
SLIDE 5

Low-rank Matrix Completion

  • How do you fill in the missing data?

Xij known for black cells Xij unknown for white cells

X =

slide-6
SLIDE 6

G K

Controller Design Constraints involving the rank of the Hankel Operator, Matrix, or Singular Values Model Reduction System Identification Multitask Learning Euclidean Embedding Rank of: Matrix of Classifiers Gram Matrix Recommender Systems Data Matrix

slide-7
SLIDE 7

Affine Rank Minimization

  • PROBLEM: Find the matrix of lowest rank that

satisfies/ approximates the underdetermined linear system

  • NP-HARD:

– Reduce to finding solutions to polynomial systems – Hard to approximate – Exact algorithms are awful

slide-8
SLIDE 8

Proposed Heuristic

  • Proposed by Fazel (2002).
  • Nuclear norm is the “numerical rank” in numerical

analysis

  • The “trace heuristic” from controls if X is p.s.d.

Convex Relaxation: Affine Rank Minim ization:

slide-9
SLIDE 9
  • Search for best linear combination of fewest atoms
  • “rank” = fewest atoms needed to describe the model

Parsimonious Models

atoms model weights rank

slide-10
SLIDE 10
  • 2x2 matrices
  • plotted in 3d

rank 1 x2 + z2 + 2y2 = 1 Convex hull:

slide-11
SLIDE 11
  • 2x2 matrices
  • plotted in 3d
  • Projection onto x-z

plane is l1 ball

slide-12
SLIDE 12

w1 w2 A(X)= b

slide-13
SLIDE 13

So how do we compute it? And when does it work?

  • 2x2 matrices
  • plotted in 3d
  • Not polyhedral…
slide-14
SLIDE 14

Equivalent Formulations

  • Semidefinite embedding:
  • Low rank parametrization:
slide-15
SLIDE 15

Computationally: Gradient Descent!

  • “Method of multipliers”
  • Schedule for 

controls the noise in the data

  • Same global minimum as nuclear norm
  • Dual certificate for the optimal solution
  • When will this fail and when it might succeed?
slide-16
SLIDE 16

First theory result

  • If m > c0

r(k+ n-r)log(kn), the heuristic succeeds for most A

  • Number of measurements c0 r(k+ n-r) log(kn)
  • Approach: Show that a random A

is nearly an isometry

  • n the manifold of low-rank matrices.
  • Stable to noise in measurement vector b and returns as

good an answer as a truncated SVD of the true X.

constant intrinsic dim ension am bient dim ension

Recht, Fazel, and Parrilo. 2007.

slide-17
SLIDE 17

Low-rank Matrix Completion

  • How do you fill in the missing data?

Xij known for black cells Xij unknown for white cells

X =

slide-18
SLIDE 18

Which matrices?

  • Any subset of entries

that misses the (1,1) component tells you nothing!

  • Still need to see the

entire first row

  • Want each entry to

provide nearly the same amount of information

X = X =

slide-19
SLIDE 19

Incoherence

  • Let U be a subspace of Rn of dimension r and PU be the
  • rthogonal projection onto U. Then the coherence of U

(with respect to the standard basis e i ) is defined to be

  • (U) ≥

1

– e.g., span of r columns of the Fourier transform

  • (U) ≤

n/ r

– e.g., any subspace that contains a standard basis element

  • (U) = O(1)

– sampled from the uniform distribution with r > log n

slide-20
SLIDE 20

Matrix Completion

  • Suppose X is k x n (k≤

n) has rank r and has row and column spaces with incoherence bounded above by . Then the nuclear norm heuristic recovers X from most subsets of entries  with cardinality at least

  • If, in addition, r ≤

-1 n1/ 5, then entries suffice. Candès and Recht. 2008

slide-21
SLIDE 21

Proof Tools

  • Convex Analysis

– KKT Conditions: Find dual certificate proving minimum nuclear norm solution is the hidden low rank matrix – Compressed Sensing: Use ansatz for multiplier and bound its norm

  • Probability on Banach Spaces

– Moment bounds for norms of matrix valued random variables [ Rudelson] – Decoupling [ Bourgain-Tzafiri, de la Pena et al] : Indicators variables can be treated as independent – Non-commutative Khintchine Inequality [ Lust-Piquard] : Tightly bound the operator norm in terms of the largest entry.

slide-22
SLIDE 22

… … … …

Gradient descent

  • n low-rank

nuclear norm parameterization Mixture of hundreds of models, including nuclear norm

slide-23
SLIDE 23

Parsimonious Modeling: A road map

  • Open Problem s in rank m inim ization: optimal

bounds, noise performance, faster algorithms, more mining of connections with compressed sensing

  • Expanding the parsim ony catalog: dynamical

systems, nonlinear models, tensors, completely positive matrices, Jordan Algebras, and beyond

  • Autom atic parsim onious program m ing:

computational complexity of norms. algorithm and proof generation

  • Broad applied im pact: data mining time series in

biology, medicine, social networks, and human computer interfaces

slide-24
SLIDE 24

Acknowledgements

  • See:

http: / / www.ist.caltech.edu/ ~ brecht/ publications.html for all references

  • Results developed in collaboration with Emmanuel

Candès, John Doyle, Babak Hassibi, and Weiyu Xu at Caltech, Ali Rahimi at Intel Research, Harvey Cohen, Lawrence Recht, and John Whitin at Stanford, Maryam Fazel at U Washington, and Pablo Parrilo and the RealNose team at MIT.