Matrix Factorization and Factorization Machines for Recommender - PowerPoint PPT Presentation

Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer Science National Taiwan University Talk at SDM workshop on Machine Learning Methods on Recommender Systems, May 2, 2015 Chih-Jen Lin (National Taiwan Univ.) 1 / 54

Outline Matrix factorization 1 Factorization machines 2 Conclusions 3 Chih-Jen Lin (National Taiwan Univ.) 2 / 54

In this talk I will briefly discuss two related topics Fast matrix factorization (MF) in shared-memory systems Factorization machines (FM) for recommender systems and classification/regression Note that MF is a special case of FM Chih-Jen Lin (National Taiwan Univ.) 3 / 54

Matrix factorization Outline Matrix factorization 1 Introduction and issues for parallelization Our approach in the package LIBMF Factorization machines 2 Conclusions 3 Chih-Jen Lin (National Taiwan Univ.) 4 / 54

Matrix factorization Introduction and issues for parallelization Outline Matrix factorization 1 Introduction and issues for parallelization Our approach in the package LIBMF Factorization machines 2 Conclusions 3 Chih-Jen Lin (National Taiwan Univ.) 5 / 54

Matrix factorization Introduction and issues for parallelization Matrix Factorization Matrix Factorization is an effective method for recommender systems (e.g., Netflix Prize and KDD Cup 2011) But training is slow. We developed a parallel MF package LIBMF for shared-memory systems http://www.csie.ntu.edu.tw/~cjlin/libmf Best paper award at ACM RecSys 2013 Chih-Jen Lin (National Taiwan Univ.) 6 / 54

Matrix factorization Introduction and issues for parallelization Matrix Factorization (Cont’d) For recommender systems: a group of users give ratings to some items User Item Rating 1 5 100 1 10 80 1 13 30 . . . . . . . . . u v r . . . . . . . . . The information can be represented by a rating matrix R Chih-Jen Lin (National Taiwan Univ.) 7 / 54

Matrix factorization Introduction and issues for parallelization Matrix Factorization (Cont’d) R 1 2 .. v .. n 1 ? 2 , 2 2 : r u , v u : m m × n m , n : numbers of users and items u , v : index for u th user and v th item r u , v : u th user gives a rating r u , v to v th item Chih-Jen Lin (National Taiwan Univ.) 8 / 54

Matrix factorization Introduction and issues for parallelization Matrix Factorization (Cont’d) P T R 1 2 .. v .. n Q p T 1 1 p T ? 2 , 2 2 2 : : ≈ × q 1 q 2 .. q v .. q n q 2 r u , v u p T u : : p T m m m × n m × k k × n k : number of latent dimensions r u , v = p T u q v ? 2 , 2 = p T 2 q 2 Chih-Jen Lin (National Taiwan Univ.) 9 / 54

Matrix factorization Introduction and issues for parallelization Matrix Factorization (Cont’d) A non-convex optimization problem: � � � u q v ) 2 + λ P � p u � 2 F + λ Q � q v � 2 ( r u , v − p T min F P , Q ( u , v ) ∈ R λ P and λ Q are regularization parameters SG (Stochastic Gradient) is now a popular optimization method for MF It loops over ratings in the training set. Chih-Jen Lin (National Taiwan Univ.) 10 / 54

Matrix factorization Introduction and issues for parallelization Matrix Factorization (Cont’d) SG update rule: p u ← p u + γ ( e u , v q v − λ P p u ) , q v ← q v + γ ( e u , v p u − λ Q q v ) where e u , v ≡ r u , v − p T u q v SG is inherently sequential Chih-Jen Lin (National Taiwan Univ.) 11 / 54

Matrix factorization Introduction and issues for parallelization SG for Parallel MF After r 3 , 3 is selected, ratings in gray blocks cannot be updated 1 2 3 4 5 6 r 3 , 1 = p 3 T q 1 1 r 3 , 2 = p 3 T q 2 2 .. r 3 , 1 r 3 , 2 r 3 , 3 r 3 , 4 r 3 , 5 r 3 , 6 3 r 3 , 6 = p 3 T q 6 4 —————— 5 r 3 , 3 = p 3 T q 3 6 r 6 , 6 r 6 , 6 = p 6 T q 6 But r 6 , 6 can be used Chih-Jen Lin (National Taiwan Univ.) 12 / 54

Matrix factorization Introduction and issues for parallelization SG for Parallel MF (Cont’d) We can split the matrix to blocks. Then use threads to update the blocks where ratings in different blocks don’t share p or q 1 2 3 4 5 6 1 2 3 4 5 6 Chih-Jen Lin (National Taiwan Univ.) 13 / 54

Matrix factorization Introduction and issues for parallelization SG for Parallel MF (Cont’d) This concept of splitting data to independent blocks seems to work However, there are many issues to have a right implementation under the given architecture Chih-Jen Lin (National Taiwan Univ.) 14 / 54

Matrix factorization Our approach in the package LIBMF Outline Matrix factorization 1 Introduction and issues for parallelization Our approach in the package LIBMF Factorization machines 2 Conclusions 3 Chih-Jen Lin (National Taiwan Univ.) 15 / 54

Matrix factorization Our approach in the package LIBMF Our approach in the package LIBMF Parallelization (Zhuang et al., 2013; Chin et al., 2015a) Effective block splitting to avoid synchronization time Partial random method for the order of SG updates Adaptive learning rate for SG updates (Chin et al., 2015b) Details omitted due to time constraint Chih-Jen Lin (National Taiwan Univ.) 16 / 54

Matrix factorization Our approach in the package LIBMF Block Splitting and Synchronization A naive way for T nodes is to split the matrix to T × T blocks This is used in DSGD (Gemulla et al., 2011) for distributed systems. The setting is reasonable because communication cost is the main concern In distributed systems, it is difficult to move data or model Chih-Jen Lin (National Taiwan Univ.) 17 / 54

Matrix factorization Our approach in the package LIBMF Block Splitting and Synchronization (Cont’d) • Block 1: 20s • However, for shared memory • Block 2: 10s systems, synchronization is a • Block 3: 20s concern 1 2 3 We have 3 threads 1 hi Thread 0 → 10 10 → 20 1 Busy Busy 2 2 Busy Idle 3 Busy Busy 3 ok 10s wasted!! Chih-Jen Lin (National Taiwan Univ.) 18 / 54

Matrix factorization Our approach in the package LIBMF Lock-Free Scheduling We split the matrix to enough blocks. For example, with two threads, we split the matrix to 4 × 4 blocks 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 is the updated counter recording the number of updated times for each block Chih-Jen Lin (National Taiwan Univ.) 19 / 54

Matrix factorization Our approach in the package LIBMF Lock-Free Scheduling (Cont’d) Firstly, T 1 selects a block randomly For T 2 , it selects a block neither green nor gray 0 0 0 0 T 1 0 0 0 0 0 0 0 0 0 0 0 0 Chih-Jen Lin (National Taiwan Univ.) 20 / 54

Matrix factorization Our approach in the package LIBMF Lock-Free Scheduling (Cont’d) For T 2 , it selects a block neither green nor gray randomly For T 2 , it selects a block neither green nor gray 0 0 0 0 T 1 0 0 0 0 0 0 0 0 0 0 0 0 T 2 Chih-Jen Lin (National Taiwan Univ.) 21 / 54

Matrix factorization Our approach in the package LIBMF Lock-Free Scheduling (Cont’d) After T 1 finishes, the counter for the corresponding block is added by one 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 2 Chih-Jen Lin (National Taiwan Univ.) 22 / 54

Matrix factorization Our approach in the package LIBMF Lock-Free Scheduling (Cont’d) T 1 can select available blocks to update Rule: select one that is least updated 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 2 Chih-Jen Lin (National Taiwan Univ.) 23 / 54

Matrix factorization Our approach in the package LIBMF Lock-Free Scheduling (Cont’d) SG: applying Lock-Free Scheduling SG**: applying DSGD-like Scheduling 0.9 SG** 24 SG** SG SG 23.5 0.88 RMSE RMSE 23 0.86 22.5 0.84 22 0 2 4 6 8 10 0 100 200 300 400 500 600 Time(s) Time(s) MovieLens 10M Yahoo!Music MovieLens 10M: 18.71s → 9.72s (RMSE: 0.835) Yahoo!Music: 728.23s → 462.55s (RMSE: 21.985) Chih-Jen Lin (National Taiwan Univ.) 24 / 54

Matrix factorization Our approach in the package LIBMF Memory Discontinuity Discontinuous memory access can dramatically increase the training time. For SG, two possible update orders are Update order Advantages Disadvantages Random Faster and stable Memory discontinuity Sequential Memory continuity Not stable Sequential Random R R Our lock-free scheduling gives randomness, but the resulting code may not be cache friendly Chih-Jen Lin (National Taiwan Univ.) 25 / 54

Matrix factorization Our approach in the package LIBMF Partial Random Method Our solution is that for each block, access both ˆ R and ˆ P continuously ˆ ˆ R : ( one block ) P T 1 ˆ Q 2 4 3 = × 5 6 Partial: sequential in each block Random: random when selecting block Chih-Jen Lin (National Taiwan Univ.) 26 / 54

Matrix factorization Our approach in the package LIBMF Partial Random Method (Cont’d) 45 1.3 Random Random Partial Random Partial Random 40 1.2 RMSE 1.1 RMSE 35 1 30 0.9 25 0.8 20 0 20 40 60 80 100 0 500 1000 1500 2000 2500 3000 Time(s) Time(s) MovieLens 10M Yahoo!Music The performance of Partial Random Method is better than that of Random Method Chih-Jen Lin (National Taiwan Univ.) 27 / 54

Matrix Factorization and Factorization Machines for Recommender - PowerPoint PPT Presentation

Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer Science National Taiwan University Talk at SDM workshop on Machine Learning Methods on Recommender Systems, May 2, 2015 Chih-Jen Lin

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

A Model For Mixed Linear-Tropical Matrix Factorization James Hook, Sanjar Karaev, Pauli Miettinen

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

Knowledge Tracing Machines: Factorization Machines for Knowledge Tracing Jill-Jnn Vie Hisashi

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

Field-aware Factorization Machines YuChin Juan, Yong Zhuang, and Wei-Sheng Chin NTU CSIE MLGroup

Synergy with Neutrino Factory R&D Apologies and thanks: I would very much have liked to come

Wealth and Income Inequality in America 1949 - 2013 Moritz Kuhn Moritz Schularick Ulrike I.

On the Classification of Brane Tilings John Davey Amihay Hanany, Jurgis Pasukonis Z urich,

How CTA uses InDiCo How CTA uses InDiCo InDiCo Workshop Dirk Hoffmann, May 27 th 2013 Dirk

FSU Depa rtment of Computer DEPARTMENT OF COMPUTER SCIENCE Science Flo rida State

DIGITAL FABRICATION 2018 CNC TOOL TYPES Additive 3d printing : selective laser sintering (SLS)

An Overview The information contained in this document is current as of July 12, 2013 1 ACA

Matrix Factorization and Factorization Machines for Recommender - PowerPoint PPT Presentation

Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer Science National Taiwan University Talk at SDM workshop on Machine Learning Methods on Recommender Systems, May 2, 2015 Chih-Jen Lin

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

A Model For Mixed Linear-Tropical Matrix Factorization James Hook, Sanjar Karaev, Pauli Miettinen

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

Knowledge Tracing Machines: Factorization Machines for Knowledge Tracing Jill-Jnn Vie Hisashi

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

Field-aware Factorization Machines YuChin Juan, Yong Zhuang, and Wei-Sheng Chin NTU CSIE MLGroup

Synergy with Neutrino Factory R&amp;D Apologies and thanks: I would very much have liked to come

Wealth and Income Inequality in America 1949 - 2013 Moritz Kuhn Moritz Schularick Ulrike I.

On the Classification of Brane Tilings John Davey Amihay Hanany, Jurgis Pasukonis Z urich,

How CTA uses InDiCo How CTA uses InDiCo InDiCo Workshop Dirk Hoffmann, May 27 th 2013 Dirk

FSU Depa rtment of Computer DEPARTMENT OF COMPUTER SCIENCE Science Flo rida State

DIGITAL FABRICATION 2018 CNC TOOL TYPES Additive 3d printing : selective laser sintering (SLS)

An Overview The information contained in this document is current as of July 12, 2013 1 ACA

Synergy with Neutrino Factory R&D Apologies and thanks: I would very much have liked to come