Matrix Completion and Matrix Concentration Lester Mackey, Ameet - PowerPoint PPT Presentation

Matrix Completion and Matrix Concentration Lester Mackey, Ameet Talwalkar, Michael I. Jordan University of California, Berkeley Richard Chen, Brendan Farrell, Joel Tropp Caltech October 8, 2012

Part I Divide-Factor-Combine Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 2 / 39

Introduction Motivation: Large-scale Matrix Completion Goal: Estimate a matrix L 0 ∈ R m × n given a subset of its entries     ? ? 1 . . . 4 2 3 1 . . . 4  → 3 ? ? . . . ? 3 4 5 . . . 1    ? 5 ? . . . 5 2 5 3 . . . 5 Examples Collaborative filtering: How will user i rate movie j ? Netflix: 10 million users, 100K DVD titles Ranking on the web: Is URL j relevant to user i ? Google News: millions of articles, millions of users Link prediction: Is user i friends with user j ? Facebook: 500 million users Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 3 / 39

Introduction Motivation: Large-scale Matrix Completion Goal: Estimate a matrix L 0 ∈ R m × n given a subset of its entries     ? ? 1 . . . 4 2 3 1 . . . 4 3 ? ? . . . ?  → 3 4 5 . . . 1    ? 5 ? . . . 5 2 5 3 . . . 5 State of the art MC algorithms Strong estimation guarantees Plagued by expensive subroutines (e.g., truncated SVD) This talk Present divide and conquer approaches for scaling up any MC algorithm while maintaining strong estimation guarantees Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 4 / 39

Matrix Completion Background Exact Matrix Completion Goal: Estimate a matrix L 0 ∈ R m × n given a subset of its entries Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 5 / 39

Matrix Completion Background Noisy Matrix Completion Goal: Given entries from a matrix M = L 0 + Z ∈ R m × n where Z is entrywise noise and L 0 has rank r ≪ m, n , estimate L 0 Good news: L 0 has ∼ ( m + n ) r ≪ mn degrees of freedom B ⊤ = L 0 A Factored form: AB ⊤ for A ∈ R m × r and B ∈ R n × r Bad news: Not all low-rank matrices can be recovered Question: What can go wrong? Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 6 / 39

Matrix Completion Background What can go wrong? Entire column missing   1 2 ? 3 . . . 4 3 5 ? 4 . . . 1   2 5 ? 2 . . . 5 No hope of recovery! Solution: Uniform observation model Assume that the set of s observed entries Ω is drawn uniformly at random: Ω ∼ Unif ( m, n, s ) Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 7 / 39

Matrix Completion Background What can go wrong? Bad spread of information     1 1 0 0 � �� 0 1 1 0 0 0 0 0 L = =    0 0 0 0 Can only recover L if L 11 is observed Solution: Incoherence with standard basis (Cand` es and Recht, 2009) A matrix L = UΣV ⊤ ∈ R m × n with rank( L ) = r is ( µ, r ) -coherent if 2 ≤ µr/m � max i � UU ⊤ e i � Singular vectors are not too sparse: 2 ≤ µr/n max i � VV ⊤ e i � � µr and not too cross-correlated: � UV ⊤ � ∞ ≤ mn Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 8 / 39

Matrix Completion Background How do we estimate L 0 ? First attempt: minimize A rank( A ) ( i,j ) ∈ Ω ( A ij − M ij ) 2 ≤ ∆ 2 . � subject to Problem: Intractable to solve! Solution: Solve convex relaxation (Fazel, Hindi, and Boyd, 2001; Cand` es and Plan, 2010) minimize A � A � ∗ ( i,j ) ∈ Ω ( A ij − M ij ) 2 ≤ ∆ 2 subject to � where � A � ∗ = � k σ k ( A ) is the trace/nuclear norm of A . Questions: Will the nuclear norm heuristic successfully recover L 0 ? Can nuclear norm minimization scale to large MC problems? Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 9 / 39

Matrix Completion Background Noisy Nuclear Norm Heuristic: Does it work? Yes, with high probability. Typical Theorem If L 0 is ( µ, r ) -coherent, s = O ( µrn log 2 ( n )) entries of M ∈ R m × n are observed uniformly at random, and ˆ L solves the noisy nuclear norm heuristic, then � ˆ L − L 0 � F ≤ f ( m, n )∆ with high probability when � M − L 0 � F ≤ ∆ . See Cand` es and Plan (2010); Mackey, Talwalkar, and Jordan (2011); Keshavan, Montanari, and Oh (2010); Negahban and Wainwright (2010) Implies exact recovery in the noiseless setting ( ∆ = 0 ) Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 10 / 39

Matrix Completion Background Noisy Nuclear Norm Heuristic: Does it scale? Not quite... Standard interior point methods (Cand` es and Recht, 2009) : O( | Ω | ( m + n ) 3 + | Ω | 2 ( m + n ) 2 + | Ω | 3 ) More efficient, tailored algorithms: Singular Value Thresholding (SVT) (Cai, Cand` es, and Shen, 2010) Augmented Lagrange Multiplier (ALM) (Lin, Chen, Wu, and Ma, 2009) Accelerated Proximal Gradient (APG) (Toh and Yun, 2010) All require rank- k truncated SVD on every iteration Take away: Provably accurate MC algorithms are still too expensive for large-scale or real-time matrix completion Question: How can we scale up a given matrix completion algorithm and still retain estimation guarantees? Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 11 / 39

Matrix Completion DFC Divide-Factor-Combine ( DFC ) Our Solution: Divide and conquer Divide M into submatrices. 1 Factor each submatrix in parallel . 2 Combine submatrix estimates to estimate L 0 . 3 Advantages Factoring a submatrix is often much cheaper than factoring M Multiple submatrix factorizations can be carried out in parallel DFC works with any base MC algorithm With the right choice of division and recombination, yields estimation guarantees comparable to those of the base algorithm Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 12 / 39

Matrix Completion DFC DFC-Proj : Partition and Project Randomly partition M into n/l column submatrices 1 where each C i ∈ R m × l � � M = C 1 C 2 · · · C n/l Complete the submatrices in parallel to obtain 2 � ˆ ˆ ˆ � · · · C 1 C 2 C n/l Reduced cost: Expect min( n/l, m/d ) speed-up per iteration Parallel computation: Pay cost of one cheaper MC Recover a single factorization for M by projecting each 3 submatrix onto the column space of ˆ C 1 � ˆ L proj = ˆ ˆ C 1 ˆ C + ˆ ˆ � C 1 C 2 · · · C n/l 1 Minimal cost: O( mk 2 + lk 2 ) where k = rank(ˆ L proj ) Ensemble: Project onto column space of each ˆ C j and average 4 Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 13 / 39

Matrix Completion DFC DFC : Does it work? Yes, with high probability. Theorem (Mackey, Talwalkar, and Jordan, 2011) If L 0 is ( µ, r ) -coherent and s entries of M ∈ R m × n are observed uniformly at random, then � µ 2 r 2 n 2 log 2 ( n ) � l = O sǫ 2 random columns suffice to have L proj − L 0 � F ≤ (2 + ǫ ) f ( m, n )∆ � ˆ with high probability when � M − L 0 � F ≤ ∆ and the noisy nuclear norm heuristic is used as a base algorithm. Can sample vanishingly small fraction of columns ( l/n → 0 ) whenever s = ω ( n log 2 ( n )) Implies exact recovery for noiseless ( ∆ = 0 ) setting Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 14 / 39

Matrix Completion DFC DFC : Does it work? Yes, with high probability. Proof Ideas: Uniform column/row sampling yields submatrices with low 1 coherence (high spread of information) w.h.p. Each submatrix has sufficiently many observed entries w.h.p. 2 ⇒ Submatrix completion succeeds Uniform sampling of columns/rows captures the full column/row 3 space of L 0 w.h.p. Noisy analysis builds on randomized ℓ 2 regression work of Drineas, Mahoney, and Muthukrishnan (2008) ⇒ Column projection succeeds Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 15 / 39

Matrix Completion Simulations DFC Noisy Recovery Error MC 0.25 Part−10% Proj−10% 0.2 Nys−10% Proj−Ens−10% Nys−Ens−10% 0.15 RMSE Proj−Ens−25% Base−MC 0.1 0.05 0 0 2 4 6 8 10 % revealed entries Figure: Recovery error of DFC relative to base algorithms with ( m = 10 K, r = 10) . Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 16 / 39

Matrix Completion Simulations DFC Speed-up MC 3000 Part−10% Proj−10% 2500 Nys−10% Proj−Ens−10% 2000 Nys−Ens−10% time (s) Base−MC 1500 1000 500 0 1.5 2 2.5 3 3.5 4 4.5 5 m 4 x 10 Figure: Speed-up over APG for random matrices with r = 0 . 001 m and 4% of entries revealed. Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 17 / 39

Matrix Completion CF Application: Collaborative filtering Task: Given a sparsely observed matrix of user-item ratings, predict the unobserved ratings Issues Full-rank rating matrix Noisy, non-uniform observations The Data Netflix Prize Dataset 1 100 million ratings in { 1 , . . . , 5 } 17,770 movies, 480,189 users 1 http://www.netflixprize.com/ Jordan (UC Berkeley) Matrix Completion and Concentration February 21, 2012 18 / 39

Matrix Completion and Matrix Concentration Lester Mackey, Ameet - PowerPoint PPT Presentation

Matrix Completion and Matrix Concentration Lester Mackey, Ameet Talwalkar, Michael I. Jordan University of California, Berkeley Richard Chen, Brendan Farrell, Joel Tropp Caltech October 8, 2012 Part I Divide-Factor-Combine Jordan (UC

Lecture 15: Exact Tensor Completion Joint Work with David Steurer Lecture Outline Part I:

ELD Completion Module Advice for students on completion of Modules A, B & C Why?

Singularity Degree of PSD Matrix Completion Shin-ichi Tanigawa CWI and Kyoto July 29, 2016 1 /

The Parameterized Complexity of Matrix Completion Robert Ganian Joint work with: Eduard Eiben

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

4.6 Unfailing Completion Classical completion: Try to transform a set E of equations into an

Implement Distributed Alternating Least Squares Algorithm for Matrix Completion Varun Gandhi

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

Nonlinear algebra and matrix completion Daniel Irving Bernstein Massachusetts Institute of

Matrix Completion from Fewer Entries Raghunandan Keshavan, Andrea Montanari and Sewoong Oh

Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Jared Tanner Workshop

Low Rank Matrix Completion: A Smoothed 0 -Search Wei Dai Jointly with Guangyu Zhou and

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

NewsDiffs: Version Controlling the News Eric Price Margaret Sullivan MIT The New York Times

Exposing Inconsistent Search Results with Bobble Nick Feamster Georgia Tech Wenke Lee, Xinyu Xing,

Web Vitals for a healthier open web Ben Morss Developer Advocate DrupalCon Ben Morss

Clustering CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani

Fast & Effective: Natural Language Understanding Mike Conover, Ph.D. Principal Data Scientist

TCP Options for Low Latency: Maximum ACK Delay and Microsecond Timestamps Neal Cardwell Yuchung

Solvers Marco Chiarandini Department of Mathematics & Computer Science University of

Little Love Monsters Taylorview Media Center Penny Kimmet, kimmpenn@d91.k12.id.us Dana Carvo and

Matrix Completion and Matrix Concentration Lester Mackey, Ameet - PowerPoint PPT Presentation

Matrix Completion and Matrix Concentration Lester Mackey, Ameet Talwalkar, Michael I. Jordan University of California, Berkeley Richard Chen, Brendan Farrell, Joel Tropp Caltech October 8, 2012 Part I Divide-Factor-Combine Jordan (UC

Lecture 15: Exact Tensor Completion Joint Work with David Steurer Lecture Outline Part I:

ELD Completion Module Advice for students on completion of Modules A, B &amp; C Why?

Singularity Degree of PSD Matrix Completion Shin-ichi Tanigawa CWI and Kyoto July 29, 2016 1 /

The Parameterized Complexity of Matrix Completion Robert Ganian Joint work with: Eduard Eiben

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

4.6 Unfailing Completion Classical completion: Try to transform a set E of equations into an

Implement Distributed Alternating Least Squares Algorithm for Matrix Completion Varun Gandhi

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

Nonlinear algebra and matrix completion Daniel Irving Bernstein Massachusetts Institute of

Matrix Completion from Fewer Entries Raghunandan Keshavan, Andrea Montanari and Sewoong Oh

Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Jared Tanner Workshop

Low Rank Matrix Completion: A Smoothed 0 -Search Wei Dai Jointly with Guangyu Zhou and

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

NewsDiffs: Version Controlling the News Eric Price Margaret Sullivan MIT The New York Times

Exposing Inconsistent Search Results with Bobble Nick Feamster Georgia Tech Wenke Lee, Xinyu Xing,

Web Vitals for a healthier open web Ben Morss Developer Advocate DrupalCon Ben Morss

Clustering CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani

Fast &amp; Effective: Natural Language Understanding Mike Conover, Ph.D. Principal Data Scientist

TCP Options for Low Latency: Maximum ACK Delay and Microsecond Timestamps Neal Cardwell Yuchung

Solvers Marco Chiarandini Department of Mathematics &amp; Computer Science University of

Little Love Monsters Taylorview Media Center Penny Kimmet, kimmpenn@d91.k12.id.us Dana Carvo and

ELD Completion Module Advice for students on completion of Modules A, B & C Why?

Fast & Effective: Natural Language Understanding Mike Conover, Ph.D. Principal Data Scientist

Solvers Marco Chiarandini Department of Mathematics & Computer Science University of