Matrix Completion and Matrix Concentration Lester Mackey - PowerPoint PPT Presentation

Matrix Completion and Matrix Concentration Lester Mackey † Collaborators: Ameet Talwalkar ‡ , Michael I. Jordan †† , Richard Y. Chen ∗ , Brendan Farrell ∗ , Joel A. Tropp ∗ , and Daniel Paulin ∗∗ † Stanford University ‡ UCLA †† UC Berkeley ∗ California Institute of Technology ∗∗ National University of Singapore February 9, 2016 Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 1 / 43

Part I Divide-Factor-Combine Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 2 / 43

Introduction Motivation: Large-scale Matrix Completion Goal: Estimate a matrix L 0 ∈ R m × n given a subset of its entries     ? ? 1 . . . 4 2 3 1 . . . 4  → 3 ? ? . . . ? 3 4 5 . . . 1    ? 5 ? . . . 5 2 5 3 . . . 5 Examples Collaborative filtering: How will user i rate movie j ? Netflix: 40 million users, 200K movies and television shows Ranking on the web: Is URL j relevant to user i ? Google News: millions of articles, 1 billion users Link prediction: Is user i friends with user j ? Facebook: 1.5 billion users Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 3 / 43

Introduction Motivation: Large-scale Matrix Completion Goal: Estimate a matrix L 0 ∈ R m × n given a subset of its entries     ? ? 1 . . . 4 2 3 1 . . . 4 3 ? ? . . . ?  → 3 4 5 . . . 1    ? 5 ? . . . 5 2 5 3 . . . 5 State of the art MC algorithms Strong estimation guarantees Plagued by expensive subroutines (e.g., truncated SVD) This talk Present divide and conquer approaches for scaling up any MC algorithm while maintaining strong estimation guarantees Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 4 / 43

Matrix Completion Background Exact Matrix Completion Goal: Estimate a matrix L 0 ∈ R m × n given a subset of its entries Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 5 / 43

Matrix Completion Background Noisy Matrix Completion Goal: Given entries from a matrix M = L 0 + Z ∈ R m × n where Z is entrywise noise and L 0 has rank r ≪ m, n , estimate L 0 Good news: L 0 has ∼ ( m + n ) r ≪ mn degrees of freedom B ⊤ = L 0 A Factored form: AB ⊤ for A ∈ R m × r and B ∈ R n × r Bad news: Not all low-rank matrices can be recovered Question: What can go wrong? Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 6 / 43

Matrix Completion Background What can go wrong? Entire column missing   1 2 ? 3 . . . 4 3 5 ? 4 . . . 1   2 5 ? 2 . . . 5 No hope of recovery! Standard solution: Uniform observation model Assume that the set of s observed entries Ω is drawn uniformly at random: Ω ∼ Unif ( m, n, s ) Can be relaxed to non-uniform row and column sampling (Negahban and Wainwright, 2010) Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 7 / 43

Matrix Completion Background What can go wrong? Bad spread of information     1 1 0 0 � �� L = 0 1 1 0 0 = 0 0 0    0 0 0 0 Can only recover L if L 11 is observed Standard solution: Incoherence with standard basis (Cand` es and Recht, 2009) A matrix L = UΣV ⊤ ∈ R m × n with rank( L ) = r is incoherent if 2 ≤ µr/m � max i � UU ⊤ e i � Singular vectors are not too skewed: 2 ≤ µr/n max i � VV ⊤ e i � � µr and not too cross-correlated: � UV ⊤ � ∞ ≤ mn (In this literature, it’s good to be incoherent) Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 8 / 43

Matrix Completion Background How do we estimate L 0 ? First attempt: minimize A rank( A ) ( i,j ) ∈ Ω ( A ij − M ij ) 2 ≤ ∆ 2 . subject to � Problem: Computationally intractable! Solution: Solve convex relaxation (Fazel, Hindi, and Boyd, 2001; Cand` es and Plan, 2010) minimize A � A � ∗ ( i,j ) ∈ Ω ( A ij − M ij ) 2 ≤ ∆ 2 � subject to where � A � ∗ = � k σ k ( A ) is the trace/nuclear norm of A . Questions: Will the nuclear norm heuristic successfully recover L 0 ? Can nuclear norm minimization scale to large MC problems? Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 9 / 43

Matrix Completion Background Noisy Nuclear Norm Heuristic: Does it work? Yes, with high probability. Typical Theorem If L 0 with rank r is incoherent, s � rn log 2 ( n ) entries of M ∈ R m × n are observed uniformly at random, and ˆ L solves the noisy nuclear norm heuristic, then � ˆ L − L 0 � F ≤ f ( m, n )∆ with high probability when � M − L 0 � F ≤ ∆ . See Cand` es and Plan (2010); Mackey, Talwalkar, and Jordan (2011); Keshavan, Montanari, and Oh (2010); Negahban and Wainwright (2010) Implies exact recovery in the noiseless setting ( ∆ = 0 ) Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 10 / 43

Matrix Completion Background Noisy Nuclear Norm Heuristic: Does it scale? Not quite... Standard interior point methods (Cand` es and Recht, 2009) : O( | Ω | ( m + n ) 3 + | Ω | 2 ( m + n ) 2 + | Ω | 3 ) More efficient, tailored algorithms: Singular Value Thresholding (SVT) (Cai, Cand` es, and Shen, 2010) Augmented Lagrange Multiplier (ALM) (Lin, Chen, Wu, and Ma, 2009) Accelerated Proximal Gradient (APG) (Toh and Yun, 2010) All require rank- k truncated SVD on every iteration Take away: These provably accurate MC algorithms are too expensive for large-scale or real-time matrix completion Question: How can we scale up a given matrix completion algorithm and still retain estimation guarantees? Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 11 / 43

Matrix Completion DFC Divide-Factor-Combine ( DFC ) Our Solution: Divide and conquer Divide M into submatrices. 1 Complete each submatrix in parallel . 2 Combine submatrix estimates, using techniques from randomized 3 low-rank approximation. Advantages Completing a submatrix often much cheaper than completing M Multiple submatrix completions can be carried out in parallel DFC works with any base MC algorithm The right choices of division and recombination yield estimation guarantees comparable to those of the base algorithm Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 12 / 43

Matrix Completion DFC DFC-Proj : Partition and Project Randomly partition M into t column submatrices 1 where each C i ∈ R m × l � � M = C 1 C 2 · · · C t Complete the submatrices in parallel to obtain 2 � ˆ ˆ ˆ � · · · C 1 C 2 C t Reduced cost: Expect t -fold speed-up per iteration Parallel computation: Pay cost of one cheaper MC Project submatrices onto a single low-dimensional column space 3 Estimate column space of L 0 with column space of ˆ C 1 � ˆ L proj = ˆ ˆ C 1 ˆ C + ˆ ˆ � · · · C 1 C 2 C t 1 Common technique for randomized low-rank approximation (Frieze, Kannan, and Vempala, 1998) Minimal cost: O( mk 2 + lk 2 ) where k = rank(ˆ L proj ) Ensemble: Project onto column space of each ˆ C j and average 4 Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 13 / 43

Matrix Completion DFC DFC : Does it work? Yes, with high probability. Theorem (Mackey, Talwalkar, and Jordan, 2014b) If L 0 with rank r is incoherent and s = ω ( r 2 n log 2 ( n ) /ǫ 2 ) entries of M ∈ R m × n are observed uniformly at random, then l = o ( n ) random columns suffice to have L proj − L 0 � F ≤ (2 + ǫ ) f ( m, n )∆ � ˆ with high probability when � M − L 0 � F ≤ ∆ and the noisy nuclear norm heuristic is used as a base algorithm. Can sample vanishingly small fraction of columns ( l/n → 0 ) Implies exact recovery for noiseless ( ∆ = 0 ) setting Analysis streamlined by matrix Bernstein inequality Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 14 / 43

Matrix Completion DFC DFC : Does it work? Yes, with high probability. Proof Ideas: If L 0 is incoherent (has good spread of information), its 1 partitioned submatrices are incoherent w.h.p. Each submatrix has sufficiently many observed entries w.h.p. 2 ⇒ Submatrix completion succeeds Random submatrix captures the full column space of L 0 w.h.p. 3 Analysis builds on randomized ℓ 2 regression work of Drineas, Mahoney, and Muthukrishnan (2008) ⇒ Column projection succeeds Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 15 / 43

Matrix Completion Simulations DFC Noisy Recovery Error MC 0.25 Proj−10% Proj−Ens−10% 0.2 Base−MC 0.15 RMSE 0.1 0.05 0 0 2 4 6 8 10 % revealed entries Figure : Recovery error of DFC relative to base algorithm (APG) with m = 10 K and r = 10 . Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 16 / 43

Matrix Completion Simulations DFC Speed-up MC 3500 Proj−10% 3000 Proj−Ens−10% Base−MC 2500 time (s) 2000 1500 1000 500 0 1 2 3 4 5 m 4 x 10 Figure : Speed-up over base algorithm (APG) for random matrices with r = 0 . 001 m and 4% of entries revealed. Mackey (Stanford) Matrix Completion and Concentration February 9, 2016 17 / 43

Matrix Completion and Matrix Concentration Lester Mackey - PowerPoint PPT Presentation

Matrix Completion and Matrix Concentration Lester Mackey Collaborators: Ameet Talwalkar , Michael I. Jordan , Richard Y. Chen , Brendan Farrell , Joel A. Tropp , and Daniel Paulin Stanford University UCLA

OSMOSIS and DIFFUSION Concentration gradient Concentration Gradient - change in the concentration

Matrix Completion and Matrix Concentration Lester Mackey, Ameet Talwalkar, Michael I. Jordan

Lecture 15: Exact Tensor Completion Joint Work with David Steurer Lecture Outline Part I:

Probabilistic Program Analysis and Concentration of Measure Part I: Concentration of Measure

Diffusion Contaminant at Contaminant Solutes (contaminants) migrate due to concentration

ELD Completion Module Advice for students on completion of Modules A, B & C Why?

Singularity Degree of PSD Matrix Completion Shin-ichi Tanigawa CWI and Kyoto July 29, 2016 1 /

The Parameterized Complexity of Matrix Completion Robert Ganian Joint work with: Eduard Eiben

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

4.6 Unfailing Completion Classical completion: Try to transform a set E of equations into an

Concentration Risk Measures and De-concentration Optimization Luyang Fu, Ph.D., FCAS, MAAA March

The principle of concentration-compactness and an application. Alexis Drouot September 3rd 2015

Concentration inequalities G abor Lugosi ICREA and Pompeu Fabra University Barcelona what is

Nitrate concentration vs. load : Nitrate concentration vs. load : Management options that

The well- -baby vision baby vision The well Span of concentration Span of concentration

DIFINITION THE RECOGNITION THAT A MARKET IS COMPOSED OF DIFFERENT BUYERS WHO HAVE DIFFERENT

MarketSegmentation, Targeting and Positioning Market - Definition A market means a collection

The Global Banking Crisis: an African banker's response Mallam Sanusi Lamido Sanusi Governor,

Hum an Com puter I nteraction Goals of HCI Allow users to carry out tasks Safely

Third Quarter Fiscal 2020 February 5, 2020 8:00 am CDT Forward-Looking Statements This

1 Method Outline Fragment Matching Fragment Extraction Individual Correspondence

Dynamic Rupture Simulation Methods Luis A. Dalguer Consultant at 3Q-Lab GmbH , Switzerland

Concentration inequalities and tail bounds John Duchi Prof. John Duchi Outline I Basics and

Sambuz

Useful Links

Newsletter

Mail Us

Matrix Completion and Matrix Concentration Lester Mackey - PowerPoint PPT Presentation

Matrix Completion and Matrix Concentration Lester Mackey Collaborators: Ameet Talwalkar , Michael I. Jordan , Richard Y. Chen , Brendan Farrell , Joel A. Tropp , and Daniel Paulin Stanford University UCLA

OSMOSIS and DIFFUSION Concentration gradient Concentration Gradient - change in the concentration

Matrix Completion and Matrix Concentration Lester Mackey, Ameet Talwalkar, Michael I. Jordan

Lecture 15: Exact Tensor Completion Joint Work with David Steurer Lecture Outline Part I:

Probabilistic Program Analysis and Concentration of Measure Part I: Concentration of Measure

Diffusion Contaminant at Contaminant Solutes (contaminants) migrate due to concentration

ELD Completion Module Advice for students on completion of Modules A, B &amp; C Why?

Singularity Degree of PSD Matrix Completion Shin-ichi Tanigawa CWI and Kyoto July 29, 2016 1 /

The Parameterized Complexity of Matrix Completion Robert Ganian Joint work with: Eduard Eiben

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

4.6 Unfailing Completion Classical completion: Try to transform a set E of equations into an

Concentration Risk Measures and De-concentration Optimization Luyang Fu, Ph.D., FCAS, MAAA March

The principle of concentration-compactness and an application. Alexis Drouot September 3rd 2015

Concentration inequalities G abor Lugosi ICREA and Pompeu Fabra University Barcelona what is

Nitrate concentration vs. load : Nitrate concentration vs. load : Management options that

The well- -baby vision baby vision The well Span of concentration Span of concentration

DIFINITION THE RECOGNITION THAT A MARKET IS COMPOSED OF DIFFERENT BUYERS WHO HAVE DIFFERENT

MarketSegmentation, Targeting and Positioning Market - Definition A market means a collection

The Global Banking Crisis: an African banker's response Mallam Sanusi Lamido Sanusi Governor,

Hum an Com puter I nteraction Goals of HCI Allow users to carry out tasks Safely

Third Quarter Fiscal 2020 February 5, 2020 8:00 am CDT Forward-Looking Statements This

1 Method Outline Fragment Matching Fragment Extraction Individual Correspondence

Dynamic Rupture Simulation Methods Luis A. Dalguer Consultant at 3Q-Lab GmbH , Switzerland

Concentration inequalities and tail bounds John Duchi Prof. John Duchi Outline I Basics and

Sambuz

Useful Links

Newsletter

Mail Us

ELD Completion Module Advice for students on completion of Modules A, B & C Why?