Divide-and-Conquer Matrix Factorization Lester Mackey - PowerPoint PPT Presentation

Divide-and-Conquer Matrix Factorization Lester Mackey † Collaborators: Ameet Talwalkar ‡ Michael I. Jordan †† † Stanford University ‡ UCLA †† UC Berkeley December 14, 2015 Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 1 / 42

Introduction Motivation: Large-scale Matrix Completion Goal: Estimate a matrix L 0 ∈ R m × n given a subset of its entries     ? ? 1 . . . 4 2 3 1 . . . 4  → 3 ? ? . . . ? 3 4 5 . . . 1    ? 5 ? . . . 5 2 5 3 . . . 5 Examples Collaborative filtering: How will user i rate movie j ? Netflix: 40 million users, 200K movies and television shows Ranking on the web: Is URL j relevant to user i ? Google News: millions of articles, 1 billion users Link prediction: Is user i friends with user j ? Facebook: 1.5 billion users Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 2 / 42

Introduction Motivation: Large-scale Matrix Completion Goal: Estimate a matrix L 0 ∈ R m × n given a subset of its entries     ? ? 1 . . . 4 2 3 1 . . . 4 3 ? ? . . . ?  → 3 4 5 . . . 1    ? 5 ? . . . 5 2 5 3 . . . 5 State of the art MC algorithms Strong estimation guarantees Plagued by expensive subroutines (e.g., truncated SVD) This talk Present divide and conquer approaches for scaling up any MC algorithm while maintaining strong estimation guarantees Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 3 / 42

Matrix Completion Background Exact Matrix Completion Goal: Estimate a matrix L 0 ∈ R m × n given a subset of its entries Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 4 / 42

Matrix Completion Background Noisy Matrix Completion Goal: Given entries from a matrix M = L 0 + Z ∈ R m × n where Z is entrywise noise and L 0 has rank r ≪ m, n , estimate L 0 Good news: L 0 has ∼ ( m + n ) r ≪ mn degrees of freedom B ⊤ = L 0 A Factored form: AB ⊤ for A ∈ R m × r and B ∈ R n × r Bad news: Not all low-rank matrices can be recovered Question: What can go wrong? Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 5 / 42

Matrix Completion Background What can go wrong? Entire column missing   1 2 ? 3 . . . 4 3 5 ? 4 . . . 1   2 5 ? 2 . . . 5 No hope of recovery! Solution: Uniform observation model Assume that the set of s observed entries Ω is drawn uniformly at random: Ω ∼ Unif ( m, n, s ) Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 6 / 42

Matrix Completion Background What can go wrong? Bad spread of information     1 1 0 0 � �� L = 0 1 1 0 0 = 0 0 0    0 0 0 0 Can only recover L if L 11 is observed Solution: Incoherence with standard basis (Cand` es and Recht, 2009) A matrix L = UΣV ⊤ ∈ R m × n with rank( L ) = r is incoherent if 2 ≤ µr/m � max i � UU ⊤ e i � Singular vectors are not too skewed: 2 ≤ µr/n max i � VV ⊤ e i � � µr and not too cross-correlated: � UV ⊤ � ∞ ≤ mn (In this literature, it’s good to be incoherent) Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 7 / 42

Matrix Completion Background How do we estimate L 0 ? First attempt: minimize A rank( A ) ( i,j ) ∈ Ω ( A ij − M ij ) 2 ≤ ∆ 2 . subject to � Problem: Computationally intractable! Solution: Solve convex relaxation (Fazel, Hindi, and Boyd, 2001; Cand` es and Plan, 2010) minimize A � A � ∗ ( i,j ) ∈ Ω ( A ij − M ij ) 2 ≤ ∆ 2 � subject to where � A � ∗ = � k σ k ( A ) is the trace/nuclear norm of A . Questions: Will the nuclear norm heuristic successfully recover L 0 ? Can nuclear norm minimization scale to large MC problems? Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 8 / 42

Matrix Completion Background Noisy Nuclear Norm Heuristic: Does it work? Yes, with high probability. Typical Theorem If L 0 with rank r is incoherent, s � rn log 2 ( n ) entries of M ∈ R m × n are observed uniformly at random, and ˆ L solves the noisy nuclear norm heuristic, then � ˆ L − L 0 � F ≤ f ( m, n )∆ with high probability when � M − L 0 � F ≤ ∆ . See Cand` es and Plan (2010); Mackey, Talwalkar, and Jordan (2014b); Keshavan, Montanari, and Oh (2010); Negahban and Wainwright (2010) Implies exact recovery in the noiseless setting ( ∆ = 0 ) Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 9 / 42

Matrix Completion Background Noisy Nuclear Norm Heuristic: Does it scale? Not quite... Standard interior point methods (Cand` es and Recht, 2009) : O( | Ω | ( m + n ) 3 + | Ω | 2 ( m + n ) 2 + | Ω | 3 ) More efficient, tailored algorithms: Singular Value Thresholding (SVT) (Cai, Cand` es, and Shen, 2010) Augmented Lagrange Multiplier (ALM) (Lin, Chen, Wu, and Ma, 2009a) Accelerated Proximal Gradient (APG) (Toh and Yun, 2010) All require rank- k truncated SVD on every iteration Take away: These provably accurate MC algorithms are too expensive for large-scale or real-time matrix completion Question: How can we scale up a given matrix completion algorithm and still retain estimation guarantees? Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 10 / 42

Matrix Completion DFC Divide-Factor-Combine ( DFC ) Our Solution: Divide and conquer Divide M into submatrices. 1 Factor each submatrix in parallel . 2 Combine submatrix estimates to estimate L 0 . 3 Advantages Submatrix completion is often much cheaper than completing M Multiple submatrix completions can be carried out in parallel DFC works with any base MC algorithm With the right choice of division and recombination, yields estimation guarantees comparable to those of the base algorithm Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 11 / 42

Matrix Completion DFC DFC-Proj : Partition and Project Randomly partition M into t column submatrices 1 where each C i ∈ R m × l � � M = C 1 C 2 · · · C t Complete the submatrices in parallel to obtain 2 � ˆ ˆ ˆ � · · · C 1 C 2 C t Reduced cost: Expect t -fold speed-up per iteration Parallel computation: Pay cost of one cheaper MC Project submatrices onto a single low-dimensional column space 3 Estimate column space of L 0 with column space of ˆ C 1 � ˆ L proj = ˆ ˆ C 1 ˆ C + ˆ ˆ � · · · C 1 C 2 C t 1 Common technique for randomized low-rank approximation (Frieze, Kannan, and Vempala, 1998) Minimal cost: O( mk 2 + lk 2 ) where k = rank(ˆ L proj ) Ensemble: Project onto column space of each ˆ C j and average 4 Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 12 / 42

Matrix Completion DFC DFC : Does it work? Yes, with high probability. Theorem (Mackey, Talwalkar, and Jordan, 2014b) If L 0 with rank r is incoherent and s = ω ( r 2 n log 2 ( n ) /ǫ 2 ) entries of M ∈ R m × n are observed uniformly at random, then l = o ( n ) random columns suffice to have L proj − L 0 � F ≤ (2 + ǫ ) f ( m, n )∆ � ˆ with high probability when � M − L 0 � F ≤ ∆ and the noisy nuclear norm heuristic is used as a base algorithm. Can sample vanishingly small fraction of columns ( l/n → 0 ) Implies exact recovery for noiseless ( ∆ = 0 ) setting Analysis streamlined by matrix Bernstein inequality Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 13 / 42

Matrix Completion DFC DFC : Does it work? Yes, with high probability. Proof Ideas: If L 0 is incoherent (has good spread of information), its 1 partitioned submatrices are incoherent w.h.p. Each submatrix has sufficiently many observed entries w.h.p. 2 ⇒ Submatrix completion succeeds Random submatrix captures the full column space of L 0 w.h.p. 3 Analysis builds on randomized ℓ 2 regression work of Drineas, Mahoney, and Muthukrishnan (2008) ⇒ Column projection succeeds Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 14 / 42

Matrix Completion Simulations DFC Noisy Recovery Error MC 0.25 Proj−10% Proj−Ens−10% 0.2 Base−MC 0.15 RMSE 0.1 0.05 0 0 2 4 6 8 10 % revealed entries Figure : Recovery error of DFC relative to base algorithm (APG) with m = 10 K and r = 10 . Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 15 / 42

Matrix Completion Simulations DFC Speed-up MC 3500 Proj−10% 3000 Proj−Ens−10% Base−MC 2500 time (s) 2000 1500 1000 500 0 1 2 3 4 5 m 4 x 10 Figure : Speed-up over base algorithm (APG) for random matrices with r = 0 . 001 m and 4% of entries revealed. Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 16 / 42

Matrix Completion CF Application: Collaborative filtering Task: Given a sparsely observed matrix of user-item ratings, predict the unobserved ratings Issues Full-rank rating matrix Noisy, non-uniform observations The Data Netflix Prize Dataset 1 100 million ratings in { 1 , . . . , 5 } 17,770 movies, 480,189 users 1 http://www.netflixprize.com/ Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, 2015 17 / 42

Divide-and-Conquer Matrix Factorization Lester Mackey - PowerPoint PPT Presentation

Divide-and-Conquer Matrix Factorization Lester Mackey Collaborators: Ameet Talwalkar Michael I. Jordan Stanford University UCLA UC Berkeley December 14, 2015 Mackey (Stanford) Divide-and-Conquer Matrix Factorization

Divide-Conquer-Glue Algorithms Divide-and-conquer. Divide up problem into several subproblems.

Week 2 Growth of Functions Divide-and- Divide and Conquer Conquer Min-Max- Problem Tutorial

Divide and Conquer Algorithm Design Techniques Greedy Divide and Conquer Dynamic Programming

Divide and Conquer Summary Divide Identify one or more subproblems Conquer Solve

Divide and conquer 1 The main idea for the divide and conquer is trying to divide a problem into

Divide and conquer Philip II of Macedon Divide and conquer 1) Divide your problem into

Divide-Conquer-Glue Algorithms Divide-and-conquer. Mergesort and Counting Inversions Divide

Divide and Conquer Algorithms Divide-and-Conquer The most-well known algorithm design strategy:

CSC 151 Spring 2020 Topic: Merge Sort May 4, 2020 Day 39 Self Checks Divide and Conquer

Week 3 Oliver Kullmann Divide-and- Conquer Solving Recurrences Merge Sort Solving

Module 2: Divide and Conquer Module 2: Divide and Conquer Harivinod N Harivinod N Dept. of

Outline and Reading Divide-and-conquer paradigm (5.2) Divide-and-Conquer Review Merge-sort

A divide-and-conquer algorithm for a symmetric eigenproblem Binh T. Nguyen Anh-Duc Luong-Thanh

Divide and Conquer Algorithm Theory WS 2012/13 Fabian Kuhn Divide And Conquer Principle

Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve each

CS Lunch Mary Allen Wilkes Wednesday 12:15 Kendade 307 2 Divide and Conquer Divide-and-conquer.

st Pr r rst rr

Fast Polynomial Factorization And Modular Composition Ashish Dwivedi IIT Kanpur April 15, 2017

Basing Markov Transition Systems on the Giry Monad Smooth Equiva- lence Relations Congruences

Sparsified Linear Programming for Zero-Sum Equilibrium Finding Brian Zhang 1 and Tuomas Sandholm 1

Factoring Out Assumptions to Speed Up MUS Extraction Jean-Marie Lagniez 1 Armin Biere 2 11 July

Factoring using 2n+2 qubits with Toffoli based modular multiplication aner 1 , 2 Martin Roetteler

Notes 11 Spring 2005 Clancy/Wagner 1 Primality We are studying the complexity of two very

How to Explain Ubiquity of Derivation of the CES . . . Constant Elasticity of Groups and Abelian