Transfer Learning for Semi-Supervised Collaborative Recommendation - PowerPoint PPT Presentation

Transfer Learning for Semi-Supervised Collaborative Recommendation Weike Pan 1 , Qiang Yang 2 ∗ , Yuchao Duan 1 and Zhong Ming 1 ∗ panweike@szu.edu.cn, qyang@cse.ust.hk, duanyuchao@email.szu.edu.cn, mingz@szu.edu.cn 1 College of Computer Science and Software Engineering Shenzhen University, Shenzhen, China 2 Department of Computer Science and Engineering Hong Kong University of Science and Technology, Hong Kong, China Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 1 / 27

Introduction Problem Definition Semi-Supervised Collaborative Recommendation (SSCR) Input: Labeled feedback (or explicit feedback) R = { ( u , i , r ui ) } : the rating r ui and the corresponding (user, item) pair ( u , i ) is a kind of a real-valued label and a featureless instance, respectively. Unlabeled feedback (or implicit feedback) O = { ( u , i ) } : the (user, item) pair ( u , i ) is an unlabeled instance without supervised information. Goal: predict the preference of each (user, item) pair in the test data R te . Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 2 / 27

Introduction Challenges The heterogeneity challenge: how to integrate two different types of feedback (explicit and accurate preferences vs. implicit and uncertain preferences). The uncertainty challenge: how to identify some likely-positive feedback from the unlabeled feedback associated with high uncertainty w.r.t. users’ true preferences. Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 3 / 27

Introduction Overall of Our Solution We map the SSCR problem to the transfer learning paradigm, and design an iterative algorithm, Self-Transfer Learning (sTL), containing two basic steps: For the first step of knowledge flow from the unlabeled feedback to 1 the labeled feedback, we focus on integrating the identified likely-positive unlabeled feedback into the learning task of labeled feedback. For the second step of knowledge flow from the labeled feedback 2 to the unlabeled feedback, we turn to use the tentatively learned model for further identification of likely-positive unlabeled feedback. Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 4 / 27

Introduction Advantages of Our Solution The unlabeled-to-labeled knowledge flow and labeled-to-unlabeled knowledge flow can address the heterogeneity challenge and the uncertainty challenge, respectively. The iterative algorithm is able to achieve sufficient knowledge transfer between labeled feedback and unlabeled feedback. Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 5 / 27

Introduction Notations (1/2) Table: Some notations (part 1). n user number m item number u user ID i , i ′ item ID r ui observed rating of ( u , i ) r ui predicted rating of ( u , i ) ˆ R = { ( u , i , r ui ) } labeled feedback (training) O = { ( u , i ) } unlabeled feedback (training) R te = { ( u , i , r ui ) } labeled feedback (test) I u = { i } examined items by user u ˜ Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 6 / 27

Introduction Notations (2/2) Table: Some notations (part 2). µ ∈ R global average rating value b u ∈ R user bias b i ∈ R item bias d ∈ R number of latent dimensions U u · ∈ R 1 × d user-specific feature vector U ∈ R n × d user-specific feature matrix V i · , W ( s ) i ′ · ∈ R 1 × d item-specific feature vector V , W ( s ) ∈ R m × d item-specific feature matrix T , L iteration number Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 7 / 27

Method Prediction Rule of sTL The predicted preference of user u on item i , ℓ U ( s ) ui = µ + b u + b i + U u · V T ˜ u · V T r ( ℓ ) ¯ � ˆ i · + i · , (1) s = 0 U ( s ) u W ( s ) I u and I ( s ) where ˜ ¯ i ′ · , ˜ I ( 0 ) = ˜ ⊆ ˜ 1 u · = � I u . u u I ( s ) i ′ ∈ ˜ � I ( s ) | ˜ u | Note that when ℓ = 0, the above prediction rule is exactly the same with that of SVD++. Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 8 / 27

Method Objective Function of sTL The optimization problem, n m y ui [ 1 ui ) 2 + reg (Θ ( ℓ ) )] , 2 ( r ui − ˆ r ( ℓ ) � � min (2) I ( ℓ ) , Θ ( ℓ ) u = 1 i = 1 I ( s ) s = 0 and Θ ( ℓ ) = { µ, b u , b i , U u · , V i · , W ( s ) where I ( ℓ ) = { ˜ u } ℓ i · } ℓ s = 0 are likely-to-prefer items to be identified and model parameters to be learned, respectively. The regularization term reg (Θ ( ℓ ) ) = λ 2 � U u · � 2 + λ 2 � V i · � 2 + λ 2 � b u � 2 + λ 2 � b i � 2 + u � W ( s ) u � W ( s ) i ′ · � 2 + λ i ′ · − W ( 0 ) i ′ · � 2 is used to � ℓ � ℓ λ � � I ( s ) I ( s ) s = 0 s = 1 i ′ ∈ ˜ i ′ ∈ ˜ 2 2 u � W ( s ) i ′ · − W ( 0 ) avoid overfitting. In particular, the term � ℓ i ′ · � 2 � s = 1 I ( s ) i ′ ∈ ˜ will constrain W ( s ) i ′ · to be similar to W ( 0 ) i ′ · , which is helpful to avoid overfitting when W ( s ) i ′ · is associated with insufficient training data, i.e., I ( s ) when | ˜ u | is small. Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 9 / 27

Method Learning the sTL (1/3) For the first step of unlabeled-to-labeled knowledge flow, we adopt a gradient descent algorithm to learn the model parameters. We denote g ui = 1 2 ( r ui − ˆ r ( ℓ ) ui ) 2 + reg (Θ ( ℓ ) ) and have the gradient, r ( ℓ ) ∂ g ui + ∂ reg (Θ ( ℓ ) ) ui ) ∂ ˆ ui ∂θ = ( r ui − ˆ r ( ℓ ) , (3) ∂θ ∂θ where θ can be µ , b u , b i , U u · , V i · and W ( s ) i ′ · , and the gradient thus includes ∂ g ui ∂µ = − e ui , ∂ g ui ∂ b u = − e ui + λ b u , ∂ g ui ∂ b i = − e ui + λ b i , U ( s ) ∂ U u · = − e ui V i · + λ U u · , ∂ g ui ∂ g ui s = 0 ˜ ∂ V i · = − e ui ( U u · + � ℓ u · ) + λ V i · , and ¯ ∂ g ui V i · + λ W ( s ) i ′ · + λ ( W ( s ) = − e ui i ′ · − W ( 0 ) 1 i ′ · ) with ∂ W ( s ) � I ( s ) | ˜ u | i ′· I ( s ) i ′ ∈ ˜ u , s = 0 , . . . , ℓ . Note that e ui = r ui − ˆ r ( ℓ ) ui denotes the difference between the true rating and predicted rating. Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 10 / 27

Method Learning the sTL (2/3) We then have the update rule for each model parameter, θ = θ − γ ∂ g ui ∂θ , (4) where θ again can be µ , b u , b i , U u · , V i · and W ( s ) i ′ · , and γ ( γ > 0) is the step size or learning rate when updating the model parameters. Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 11 / 27

Method Learning the sTL (3/3) For the second step of labeled-to-unlabeled knowledge flow, we use the latest learned model parameters and the accumulated identified I ( s + 1 ) items, i.e., I ( s ) and Θ ( s ) , to construct ˜ for each user u : u we estimate the preference of user u on item i for each ( u , i ) ∈ O , r ( s ) i.e., ˆ ui , via the prediction rule in Eq.(1) we remove the (user, item) pair ( u , i ) from O and put the item i in I ( s + 1 ) r ( s ) > r 0 , where r 0 is a predefined threshold ˜ if ˆ u ui I ( s + 1 ) Note that with the newly identified item set ˜ , we can integrate u them into the learning task of labeled feedback again. Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 12 / 27

Method Algorithm (1/2) 1: Input : Labeled and unlabeled feedback R , O ; tradeoff parameter λ , threshold r 0 , latent dimension number d , and iteration numbers L , T . 2: Output : Learned model parameters Θ ( L ) and identified likely-to-prefer items I ( s ) u , s = 1 , . . . , L . 3: Initialization : Initialize the item set I ( 0 ) = ˜ I u for each user u . u 4: for ℓ = 0 , . . . , L do 5: Please see the details in the next page 6: end for Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 13 / 27

Method Algorithm (2/2) 1: // Step 1: Unlabeled-to-labeled knowledge flow 2: Set the learning rating γ = 0 . 01 and initialize the model parameters Θ ( ℓ ) 3: for t = 1 , . . . , T do for t 2 = 1 , . . . , |R| do 4: Randomly pick up a rating record ( u , i , r ui ) from R 5: Calculate the gradients ∂ g ui 6: ∂θ 7: Update the model parameters θ end for 8: 9: Decrease the learning rate γ ← γ × 0 . 9 10: end for 11: // Step 2: Labeled-to-unlabeled knowledge flow 12: if ℓ < L then for u = 1 , . . . , n do 13: I ( s ) r ( ℓ ) ui ′ , i ′ ∈ ˜ s = 1 ˜ 14: Predict the preference ˆ I u \ ∪ ℓ u I ( s ) r ui > r 0 and Select some likely-to-prefer items from ˜ s = 1 ˜ 15: I u \∪ ℓ with ˆ u save them as ˜ I ( ℓ + 1 ) u end for 16: 17: end if Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 14 / 27

Method Analysis The whole algorithm iterates in L + 1 loops: When L = 0, the sTL algorithm reduces to a single step of unlabeled-to-labeled knowledge flow, which is the same with that of SVD++ using the whole unlabeled feedback without uncertainty reduction. When L = 0 and O = ∅ , sTL further reduces to the basic matrix factorization. We illustrate the relationships among sTL, SVD++ and MF as follows, L = 0 O = ∅ → SVD++ → MF , sTL (5) − − − − − − − − from which we can see that our sTL is a quite generic algorithm. Pan, Yang, Duan and Ming (SZU & HKUST) SSCR (sTL) ACM TiiS 15 / 27

Transfer Learning for Semi-Supervised Collaborative Recommendation - PowerPoint PPT Presentation

Transfer Learning for Semi-Supervised Collaborative Recommendation Weike Pan 1 , Qiang Yang 2 , Yuchao Duan 1 and Zhong Ming 1 panweike@szu.edu.cn, qyang@cse.ust.hk, duanyuchao@email.szu.edu.cn, mingz@szu.edu.cn 1 College of Computer

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Transfer learning and domain adaptation Semi-supervised and transfer learning Myth : you cant

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

5 Semi-Supervised Learning BVM Tutorial: Advanced Deep Learning Methods David Zimmerer, Division

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

Parallelizing Semi- ReDAS Lab Supervised Learning Algorithms with MapReduce Nick Gauthier

Keepin It Real: Semi-Supervised Learning with Realistic Tuning Andrew B. Goldberg Xiaojin

Chapter 21 The STL (maps and algorithms) Bjarne Stroustrup www.stroustrup.com/Programming

Logistics Project Standard Template Library II Part 1 (clock and design) due Sunday, Sept

4 Writing P o etry Session 4 Sharing Revision Strategies Poem factory Idea generators

Presented by: Civil Engineering Academy Beams Presented by: Civil Engineering Academy

Verification with the Check suite Yatin Manerkar Princeton University ARM Cambridge, July 20 th ,

Stephan Merz INRIA Lorraine & LORIA Nancy, France 1

Summary Summary What you need to know about concurrency What you need to know about concurrency

A Practical Methodology for Measuring the Side- Channel Signal Available to the Attacker for

Transfer Learning for Semi-Supervised Collaborative Recommendation - PowerPoint PPT Presentation

Transfer Learning for Semi-Supervised Collaborative Recommendation Weike Pan 1 , Qiang Yang 2 , Yuchao Duan 1 and Zhong Ming 1 panweike@szu.edu.cn, qyang@cse.ust.hk, duanyuchao@email.szu.edu.cn, mingz@szu.edu.cn 1 College of Computer

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Transfer learning and domain adaptation Semi-supervised and transfer learning Myth : you cant

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

5 Semi-Supervised Learning BVM Tutorial: Advanced Deep Learning Methods David Zimmerer, Division

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

Parallelizing Semi- ReDAS Lab Supervised Learning Algorithms with MapReduce Nick Gauthier

Keepin It Real: Semi-Supervised Learning with Realistic Tuning Andrew B. Goldberg Xiaojin

Chapter 21 The STL (maps and algorithms) Bjarne Stroustrup www.stroustrup.com/Programming

Logistics Project Standard Template Library II Part 1 (clock and design) due Sunday, Sept

4 Writing P o etry Session 4 Sharing Revision Strategies Poem factory Idea generators

Presented by: Civil Engineering Academy Beams Presented by: Civil Engineering Academy

Verification with the Check suite Yatin Manerkar Princeton University ARM Cambridge, July 20 th ,

Stephan Merz INRIA Lorraine &amp; LORIA Nancy, France 1

Summary Summary What you need to know about concurrency What you need to know about concurrency

A Practical Methodology for Measuring the Side- Channel Signal Available to the Attacker for

Stephan Merz INRIA Lorraine & LORIA Nancy, France 1