Low-Rank Matrix Approximation with Stability Dongsheng Li 1 , Chao - PowerPoint PPT Presentation

Low-Rank Matrix Approximation with Stability Dongsheng Li 1 , Chao Chen 2 , Qin (Christine) Lv 3 , Junchi Yan 1 , Li Shang 3 , Stephen M. Chu 1 1 IBM Research - China, 2 Tongji University, 3 University of Colorado Boulder 1 / 16

Problem Formulation Low-Rank Matrix Approximation (LRMA) U ∈ R m × r , V ∈ R n × r , s . t . ˆ R = UV T The optimization problem of LRMA can be described as follows: ˆ R = arg min X Loss ( R , X ) , s . t . rank ( X ) = r Example: User-item ratings matrix used by recommender systems 2 / 16

Problem Formulation Generalization performance is a problem of matrix approximation when data is sparse, incomplete, and noisy [Keshavan et al., 2010; Cand` es & Recht, 2012]. models are biased to the limited training data (sparse, incomplete) small changes in the training data (noisy) may significantly change the models. Algorithmic stability has been introduced to investigate the generalization error bounds of learning algorithms [Bousquet & Elisseeff, 2001; 2002]. A stable learning algorithm has the properties that slightly changing the training set does not result in significant change to the output the training error should have small variance the training errors are close to the test errors 3 / 16

Stability w.r.t Matrix Approximation Definition (Stability w.r.t. Matrix Approximation) For any R ∈ F m × n , choose a subset of entries Ω from R uniformly. For a given ǫ > 0, we say that D Ω (ˆ R ) is δ -stable if the following holds: Pr[ |D (ˆ R ) − D Ω (ˆ R ) | ≤ ǫ ] ≥ 1 − δ. 100% Stability vs. Gen Error 80% Percentage 60% 40% 20% 0% 0.03 0.06 0.09 0.12 0.15 RMSE Difference Figure: Stability vs. generalization error of RSVD on the MovieLens (1M) dataset. Rank r = 5 , 10 , 15 , 20 and ǫ = 0 . 0046. 500 runs. 4 / 16

Theoretical Analysis Theorem Let Ω ( | Ω | > 2 ) be a set of observed entries in R. Let ω ⊂ Ω be a subset of observed entries, which satisfy that ∀ ( i , j ) ∈ ω , R ) . Let Ω ′ = Ω − ω , then for any ǫ > 0 and | R i , j − ˆ R i , j | ≤ D Ω (ˆ 1 > λ 0 , λ 1 > 0 ( λ 0 + λ 1 = 1 ), λ 0 D Ω (ˆ R ) + λ 1 D Ω ′ (ˆ R ) and D Ω (ˆ R ) are δ 1 -stable and δ 2 -stable, resp., then δ 1 ≤ δ 2 . Remark 1. If we select a subset of entries Ω ′ from Ω that are harder to predict than average, then minimizing λ 0 D Ω (ˆ R ) + λ 1 D Ω ′ (ˆ R ) will be more stable than minimizing D Ω (ˆ R ) . 5 / 16

Theoretical Analysis Theorem Let Ω ( | Ω | > 2 ) be a set of observed entries in R. Let ω 2 ⊂ ω 1 ⊂ Ω , and ω 1 and ω 2 satisfy that ∀ ( i , j ) ∈ ω 1 ( ω 2 ) , | R i , j − ˆ R i , j | ≤ D Ω (ˆ R ) . Let Ω 1 = Ω − ω 1 and Ω 2 = Ω − ω 2 , then for any ǫ > 0 and 1 > λ 0 , λ 1 > 0 ( λ 0 + λ 1 = 1 ), λ 0 D Ω (ˆ R ) + λ 1 D Ω 1 (ˆ R ) and λ 0 D Ω (ˆ R ) + λ 1 D Ω 2 (ˆ R ) are δ 1 -stable and δ 2 -stable, resp., then δ 1 ≤ δ 2 . Remark 2. Removing more entries that are easy to predict will yield more stable matrix approximation. 6 / 16

Theoretical Analysis Theorem Let Ω ( | Ω | > 2 ) be a set of observed entries in R. ω 1 , ..., ω K ⊂ Ω (K > 1 ) satisfy that ∀ ( i , j ) ∈ ω k ( 1 ≤ k ≤ K), | R i , j − ˆ R i , j | ≤ D Ω (ˆ R ) . Let Ω k = Ω − ω k for all 1 ≤ k ≤ K. Then, for any ǫ > 0 and 1 > λ 0 , λ 1 , ..., λ K > 0 ( � K i =0 λ i = 1 ), λ 0 D Ω (ˆ k ∈ [1 , K ] λ k D Ω k (ˆ R ) + � R ) and ( λ 0 + λ K ) D Ω (ˆ k ∈ [1 , K − 1] λ k D Ω k (ˆ R ) + � R ) are δ 1 -stable and δ 2 -stable, resp., then δ 1 ≤ δ 2 . Remark 3. Minimizing D Ω together with the RMSEs of more than one hard predictable subsets of Ω will help generate more stable matrix approximation solutions. 7 / 16

New Optimization Problem We propose the SMA (Stable MA) framework that is generally applicable to any LRMA methods. E.g., a new extension of SVD: K ˆ � R = arg min λ 0 D Ω ( X ) + λ s D Ω s ( X ) s . t . rank ( X ) = r (1) X s =1 where λ 0 , λ 1 , ..., λ K define the contributions of each component in the loss function. (Extensions to other LRMA methods can be similarly derived.) 8 / 16

The SMA Learning Algorithm Require: R is the targeted matrix, Ω is the set of entries in R , and ˆ R is an approximation of R by existing LRMA methods. p > 0 . 5 is the predefined probability for en- try selection. µ 1 and µ 2 are the coefficients for L2- regularization. 1: Ω 0 = ; ; 2: for each ( i, j ) 2 Ω do randomly generate ρ 2 [0 , 1] ; 3: if ( | R i,j − ˆ R i,j | ≤ D Ω & ρ ≤ p ) or ( | R i,j − ˆ R i,j | > 4: D Ω & ρ ≤ 1 − p ) then Ω 0 ← Ω 0 [ { ( i, j ) } ; 5: end if 6: 7: end for 8: randomly divide Ω 0 into ! 1 , ..., ! K ( [ K k =1 ! i = Ω 0 ); 9: for all k 2 [1 , K ] , Ω k = Ω − ! k ; k =1 λ k D Ω k ( UV T ) 10: ( ˆ U, ˆ V ) : = arg min U,V [ P K + λ 0 D Ω ( UV T ) + µ 1 k U k 2 + µ 2 k V k 2 ] 11: return ˆ R = ˆ U ˆ V T 9 / 16

Experiments Datasets MovieLens 10M ( ∼ 70k users, 10k items, 10 7 ratings) Netflix ( ∼ 480k users, 18k items, 10 8 ratings) Performance comparison with four single MA models and three ensemble MA models as follows: Regularized SVD [Paterek et al., KDD’ 07]. BPMF [Salakhutdinov et al., ICML’ 08]. APG [Toh et al., PJO’ 2010]. GSMF [Yuan et al., AAAI’ 14]. DFC [Mackey et al., NIPS’ 11]. LLORMA [Lee et al., ICML’ 13]. WEMAREC [Our prior work, SIGIR’ 15]. 10 / 16

Experiments Generalization Performance MovieLens 10M RSVD(train set) 0.95 RSVD(test set) SMA(train set) 0.90 SMA(test set) 0.85 RMSE 0.80 0.75 0.70 0.65 0 20 40 60 80 100 120 140 160 180 Epochs Figure: Training and test errors vs. epochs of RSVD and SMA on the MovieLens 10M dataset. 11 / 16

Experiments Sensitivity of Subset Number K MovieLens 10M Netflix 0.84 0.86 RSVD RSVD 0.82 BPMF BPMF APG APG RMSE RMSE 0.84 GSMF GSMF DFC DFC 0.80 LLORMA LLORMA WEMAREC WEMAREC 0.82 SMA SMA 0.78 0.80 1 2 3 4 5 1 2 3 4 5 #Subsets #Subsets Figure: Effect of subset number K on MovieLens 10M dataset (left) and Netflix dataset (right). SMA and RSVD models are indicated by solid lines and other compared methods are indicated by dotted lines. 12 / 16

Experiments Sensitivity of Rank r MovieLens 10M Netflix 0.84 0.87 0.83 0.86 RSVD RSVD 0.82 0.85 BPMF BPMF APG APG 0.81 RMSE RMSE 0.84 GSMF GSMF DFC DFC 0.80 0.83 LLORMA LLORMA WEMAREC WEMAREC 0.79 0.82 SMA SMA 0.78 0.81 0.77 0.80 50 100 150 200 250 50 100 150 200 250 Rank Rank Figure: Effect of rank r on MovieLens 10M dataset (left) and Netflix dataset (right). SMA and RSVD models are indicated by solid lines and other compared methods are indicated by dotted lines. 13 / 16

Experiments Sensitivity of Training Set Size MovieLens 10M 1.05 RSVD(r=50) 1.00 BPMF(r=50) APG(r=50) 0.95 GSMF(r=50) RMSE SMA(r=50) 0.90 0.85 0.80 20% 40% 60% 80% Traning Set Ratio Figure: RMSEs of SMA and four single methods with varying training set size on MovieLens 10M dataset (rank r = 50). 14 / 16

Experiments Table: RMSE Comparison of SMA and Seven Other Methods MovieLens (10M) Netflix RSVD 0.8256 ± 0.0006 0.8534 ± 0.0001 BPMF 0.8197 ± 0.0004 0.8421 ± 0.0002 APG 0.8101 ± 0.0003 0.8476 ± 0.0003 GSMF 0.8012 ± 0.0011 0.8420 ± 0.0006 DFC 0.8067 ± 0.0002 0.8453 ± 0.0003 LLORMA 0.7855 ± 0.0002 0.8275 ± 0.0004 WEMAREC 0.7775 ± 0.0007 0.8143 ± 0.0001 SMA 0.7682 ± 0.0003 0.8036 ± 0.0004 15 / 16

Conclusion SMA (Stable MA), a new low-rank matrix approximation framework, is proposed, which can achieve high stability, i.e., high generalization performance; achieve better accuracy than state-of-the-art MA-based CF methods; achieve good accuracy with very sparse datasets. Source code available at: https://github.com/ldscc/StableMA.git 16 / 16

Low-Rank Matrix Approximation with Stability Dongsheng Li 1 , Chao - PowerPoint PPT Presentation

Low-Rank Matrix Approximation with Stability Dongsheng Li 1 , Chao Chen 2 , Qin (Christine) Lv 3 , Junchi Yan 1 , Li Shang 3 , Stephen M. Chu 1 1 IBM Research - China, 2 Tongji University, 3 University of Colorado Boulder 1 / 16 Problem

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

Matrix invertibility Rank-Nullity Theorem: For any n -column matrix A , nullity A + rank A = n

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

Multiple-Rank Updates to Matrix Factorizations Zack 8/30/2013 Outline u Introduction u

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

A message-passing approach to low-rank matrix reconstruction and application to clustering

6. Approximation and fitting norm approximation least-norm problems regularized

Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013 Low-rank

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

The ellipsoid method We have learned that the Markowitz mean-variance optimization problem is a

SMA Real-time Software Attila Kovcs SAO SMA Advisory Committee Meeting Cambridge, 1718

Scaling data and KNN Regression Nathan George Data Science Professor DataCamp Machine Learning

Distributed Training Khoa Le & Somin Wadhwa Background The Problem? Goto Solution:

Appendix. SMA modeling. A review on phenomenological shape memory alloy constitutive modeling

Light Calibration System Thorsten Lux Overall Conceptual Design X2 Air Black Box Cryostat

The Big Picture The CDSAT paradigm for SMT/SMA Discussion Maria Paola Bonacina Conflict-Driven

Informed Search and Exploration Sections 3.5 and 3.6 Ch. 03 p.1/51 Outline Best-first

Low-Rank Matrix Approximation with Stability Dongsheng Li 1 , Chao - PowerPoint PPT Presentation

Low-Rank Matrix Approximation with Stability Dongsheng Li 1 , Chao Chen 2 , Qin (Christine) Lv 3 , Junchi Yan 1 , Li Shang 3 , Stephen M. Chu 1 1 IBM Research - China, 2 Tongji University, 3 University of Colorado Boulder 1 / 16 Problem

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

Matrix invertibility Rank-Nullity Theorem: For any n -column matrix A , nullity A + rank A = n

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

Multiple-Rank Updates to Matrix Factorizations Zack 8/30/2013 Outline u Introduction u

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

A message-passing approach to low-rank matrix reconstruction and application to clustering

6. Approximation and fitting norm approximation least-norm problems regularized

Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013 Low-rank

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

The ellipsoid method We have learned that the Markowitz mean-variance optimization problem is a

SMA Real-time Software Attila Kovcs SAO SMA Advisory Committee Meeting Cambridge, 1718

Scaling data and KNN Regression Nathan George Data Science Professor DataCamp Machine Learning

Distributed Training Khoa Le &amp; Somin Wadhwa Background The Problem? Goto Solution:

Appendix. SMA modeling. A review on phenomenological shape memory alloy constitutive modeling

Light Calibration System Thorsten Lux Overall Conceptual Design X2 Air Black Box Cryostat

The Big Picture The CDSAT paradigm for SMT/SMA Discussion Maria Paola Bonacina Conflict-Driven

Informed Search and Exploration Sections 3.5 and 3.6 Ch. 03 p.1/51 Outline Best-first

Distributed Training Khoa Le & Somin Wadhwa Background The Problem? Goto Solution: