Neighborhood-Enhanced Transfer Learning for One-Class Collaborative - - PowerPoint PPT Presentation

neighborhood enhanced transfer learning for one class
SMART_READER_LITE
LIVE PREVIEW

Neighborhood-Enhanced Transfer Learning for One-Class Collaborative - - PowerPoint PPT Presentation

Neighborhood-Enhanced Transfer Learning for One-Class Collaborative Filtering Wanling Cai 1 , 2 , Jiongbin Zheng 1 , Weike Pan 1 , Jing Lin 1 , Lin Li 1 , Li Chen 2 , Xiaogang Peng 1 and Zhong Ming 1 cswlcai@comp.hkbu.edu.hk,


slide-1
SLIDE 1

Neighborhood-Enhanced Transfer Learning for One-Class Collaborative Filtering

Wanling Cai1,2, Jiongbin Zheng1, Weike Pan1∗, Jing Lin1, Lin Li1, Li Chen2, Xiaogang Peng1∗ and Zhong Ming1∗

cswlcai@comp.hkbu.edu.hk, jiongbin92@gmail.com, panweike@szu.edu.cn, linjing4@email.szu.edu.cn, lilin20171@email.szu.edu.cn, lichen@comp.hkbu.edu.hk, pengxg@szu.edu.cn, mingz@szu.edu.cn 1College of Computer Science and Software Engineering

Shenzhen University, Shenzhen, China

2Department of Computer Science

Hong Kong Baptist University, Hong Kong, China

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 1 / 27

slide-2
SLIDE 2

Introduction

Problem Definition

One-Class Collaborative Filtering Input: A set of (user, item) pairs P = {(u, i)}, where each (u, i) pair means that user u has a positive feedback to item i. Goal: recommend each user u ∈ U a personalized ranked list of items from the set of unobserved items, i.e., I\Pu.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 2 / 27

slide-3
SLIDE 3

Introduction

Challenges

1

The sparisity of observed feedback.

2

The ambiguity of unobserved feedback.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 3 / 27

slide-4
SLIDE 4

Introduction

Overall of Our Solution

Figure: Illustration of our transfer learning solution

Transfer by Neighborhood-Enhanced Factorization (TNF) We first extract the local knowledge of neighborhood information among users. We then transfer it to a global preference learning task in an enhanced factorization-based framework.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 4 / 27

slide-5
SLIDE 5

Introduction

Advantages of Our Solution

Our TNF is able to inherit the merits of the localized neighborhood-based methods and the globalized factorization-based methods.

Notice that neighborhood-based methods and factorization-based methods are rarely studied in one single framework or solution for OCCF.

The factored representation of users and items allows TNF to capture and model transitive relations within a group of close neighbors on datasets of low density.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 5 / 27

slide-6
SLIDE 6

Introduction

Notations

n number of users m number of items u ∈ U user ID i, i′ ∈ I item ID R = {(u, i)} universe of all possible (user, item) pairs P = {(u, i)} the whole set of observed (user, item) pairs A, |A| = ρ|P| a sampled set of negative feedback from R\P Iu item set observed by user u d number of latent dimensions bu ∈ R user bias bi ∈ R item bias Vi· ∈ R1×d item-specific latent feature vector Xu′· ∈ R1×d user-specific latent feature vector Nu a set of nearest neighbors of user u ˆ rui predicted preference of user u to item i αv , αx, βu, βv trade-off parameters on the regularization terms γ learning rate T iteration number in the algorithm

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 6 / 27

slide-7
SLIDE 7

Method

Neighborhood Construction

In order to extract the local knowledge from the records of users’ behaviors, we first calculate the cosine similarity between user u and user w, suw =

|Iu∩Iw|

|Iu|√ |Iw|,where |Iu|, |Iw|, |Iu ∩ Iw| denote the number

  • f items observed by user u, user w, and both user u and user w,

respectively. We can then obtain a set of the most similar users of each user u to construct a neighborhood Nu.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 7 / 27

slide-8
SLIDE 8

Method

Assumption

We assume that the knowledge of neighborhood extracted from the local association can be incorporated into a global factorization framework so as to better capture the latent representation. This process is just as human learning, in which people with intense concentration would digest knowledge locally but effectively while others with a big picture in mind are experts in building correlations between different domains or tasks. The learners who are able to exploit a key combination of the local and global cues may make a greater achievement.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 8 / 27

slide-9
SLIDE 9

Method

Transfer by Neighborhood-Enhanced Factorization

Specifically, a recent work [Guo et al., 2017] inspires us to aggregate the like-minded users’ preferences. Finally, we have the estimated preference of user u to item i as follows, ˆ rui = bu + bi + 1

  • |Nu|
  • u′∈Nu

Xu′·V T

i· .

(1) In this way, the local knowledge of neighborhood can be transferred into the factorization-based method. For this reason, we call it transfer by neighborhood-enhanced factorization (TNF). Notice that a closely related work FISM [Kabbur et al., 2013] focuses

  • n learning the factored item similarity by incorporating the knowledge
  • f items that have been observed by user u (i.e., Iu).

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 9 / 27

slide-10
SLIDE 10

Method

Pointwise Preference Learning

In our TNF, we adopt pointwise preference learning as our preference learning paradigm. The objective function is as follows, min

Θ

  • (u,i)∈P∪A

fui + R(Θ), (2) where fui = log(1 + exp(−ruiˆ rui)) is a loss function defined on a (u, i) pair, and Θ = {Xu′·, Vi·, bu, bi; i = 1, . . . , m, u, u′ = 1, . . . , n} denotes the set of model parameters to be learned. Notice that we use rui = 1 and rui = −1 to denote positive and negative preference for an

  • bserved (u, i) ∈ P pair and an unobserved (u, i) ∈ A pair,
  • respectively. In addition, we introduce the regularization term

R(Θ) = αx

2

  • u′∈Nu ||Xu′·||2

F + αv 2 ||Vi·||2 F + βu 2 b2 u + βv 2 b2 i so that it can

contribute to avoid overfitting, where αx, αv, βu and βv are trade-off hyper parameters.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 10 / 27

slide-11
SLIDE 11

Method

Gradients

In order to solve the optimization problem in Eq.(2), we adopt the commonly used stochastic gradient descent (SGD) algorithm. Specifically, for each (u, i) ∈ P ∪ A, we have the gradients, ∇Xu′· = ∂fui ∂Xu′· = −eui 1

  • |Nu|

Vi· + αxXu′·, u′ ∈ Nu, (3) ∇Vi· = ∂fui ∂Vi· = −eui 1

  • |Nu|
  • u′∈Nu

Xu′· + αvVi·, (4) ∇bu = ∂fui ∂bu = −eui + βubu, (5) ∇bi = ∂fui ∂bi = −eui + βvbi, (6) where eui =

rui 1+exp(ruiˆ rui), and ¯

Uu· =

1

|Nu|

  • u′∈Nu Xu′· is a certain virtual

user-specific latent feature vector of user u aggregated from the set of user neighborhood Nu.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 11 / 27

slide-12
SLIDE 12

Method

Update Rules

For each (u, i) ∈ P ∪ A, we have the update rules, Xu′· = Xu′· − γ∇Xu′·, u′ ∈ Nu, (7) Vi· = Vi· − γ∇Vi·, (8) bu = bu − γ∇bu, (9) bi = bi − γ∇bi, (10) where γ > 0 is the learning rate.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 12 / 27

slide-13
SLIDE 13

Method

Algorithm

1: Input: Observations P 2: Output: Recommended items for each user 3: Initialize model parameters Θ 4: Construct a neighborhood Nu for each user u 5: for t1 = 1, . . . , T do 6:

Randomly pick up a set A with |A| = ρ|P|

7:

for t2 = 1, 2, . . . , |P ∪ A| do

8:

Randomly draw a (u, i) pair from P ∪ A

9:

Calculate ¯ Uu· =

1

|Nu|

  • u′∈Nu Xu′·

10:

Calculate ˆ rui = bu + bi + ¯ Uu·V T

11:

Calculate eui =

rui 1+exp(ruiˆ rui )

12:

Update bu, bi, Vi· and Xu′· for u′ ∈ Nu

13:

end for

14: end for

Notes: randomly drawing a (u, i) pair from P ∪ A is more efficient than the user-wise sampling strategy in [Pan and Chen, 2013].

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 13 / 27

slide-14
SLIDE 14

Experiments

Datasets

Table: Statistics of the datasets used in the experiments, including the number of users (n), the number of items (m), the number of (user, item) pairs in the training data (|P|), the number of (user, item) pairs in the test data (|Pte|), and the density of each data i.e., (|P| + |Pte|)/n/m. Dataset n m |P| |Pte| (|P| + |Pte|)/n/m ML100K 943 1,682 27,688 27,687 3.49% ML1M 6,040 3,952 287,641 287,640 2.41% UserTag 3,000 2,000 123,218 123,218 4.11% Netflix5K5K 5,000 5,000 77,936 77,936 0.62% XING5K5K 5,000 5,000 39,197 39,197 0.31%

Notice that the datasets and code are publicly available1

1http://csse.szu.edu.cn/staff/panwk/publications/TNF/ Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 14 / 27

slide-15
SLIDE 15

Experiments

Baselines

UCF: user-oriented collaborative filtering [Aggarwal et al., 1999] MF: matrix factorization with square loss [Koren et al., 2009] BPR: Bayesian personalized ranking [Rendle et al., 2009] FISM: factored item similarity model [Kabbur et al., 2013] NeuMF: neural matrix factorization [He et al., 2017]

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 15 / 27

slide-16
SLIDE 16

Experiments

Parameter Configurations (1/2)

For UCF and TNF, we use cosine similarity and set the size of neighborhood as 20. For BPR, FISM and our TNF, we adopt the commonly used stochastic gradient descent (SGD) method with the same sampling strategy for fair comparison, and we fix the dimension as d = 20 and the learning rate as γ = 0.01. For FISM and TNF, we set ρ = 3, i.e., |A| = 3|P| For the deep model NeuMF, we implement the method using TensorFlow2 and keep the structure with the best performance as reported in [He et al., 2017].

2https://www.tensorflow.org/ Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 16 / 27

slide-17
SLIDE 17

Experiments

Parameter Configurations (2/2)

For each factorization-based algorithm on each dataset, we search the trade-off parameters from αx = αv = βu = βv ∈ {0.1, 0.01, 0.001} and find an optimal iteration number from T ∈ {10, 20, 30, . . . , 990, 1000} by checking the performance of NDCG@5 on the validation data in every ten iterations.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 17 / 27

slide-18
SLIDE 18

Experiments

Evaluation Metrics

Top-5 ranking-oriented evaluation metrics Precision@5, Recall@5, F1@5, NDCG@5, 1-call@5 For each method, we calculate the performance on warm-start users and warm-start items. In parameter search, warm-start users (or items) denote the intersected users (or items) from training data and validation data. In final test, warm-start users (or items) denote the intersected users (or items) from training data and test data.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 18 / 27

slide-19
SLIDE 19

Experiments

Main Results (1/2)

Table: Recommendation performance of UCF, MF, BPR, FISM, NeuMF and

  • ur TNF on five real-world datasets. The significantly best results are marked

in bold.

Dataset Method Prec@5 Rec@5 F1@5 NDCG@5 1-call@5 ML100K UCF 0.3448±0.0020 0.0867±0.0012 0.1197±0.0011 0.3647±0.0049 0.7940±0.0197 MF 0.3669±0.0086 0.0983±0.0045 0.1348±0.0053 0.3842±0.0078 0.8303±0.0107 BPR 0.3504±0.0065 0.0915±0.0043 0.1274±0.0041 0.3670±0.0069 0.8082±0.0161 FISM 0.4011±0.0032 0.1009±0.0011 0.1401±0.0011 0.4161±0.0037 0.8370±0.0059 NeuMF 0.3648±0.0085 0.0936±0.0048 0.1293±0.0054 0.3789±0.0094 0.8057±0.0176 TNF 0.4118±0.0080 0.1052±0.0031 0.1452±0.0042 0.4316±0.0084 0.8538±0.0134 ML1M UCF 0.3705±0.0026 0.0615±0.0011 0.0942±0.0012 0.3855±0.0024 0.8090±0.0004 MF 0.4174±0.0005 0.0704±0.0007 0.1080±0.0005 0.4306±0.0010 0.8437±0.0045 BPR 0.4180±0.0039 0.0665±0.0008 0.1030±0.0011 0.4300±0.0040 0.8202±0.0049 FISM 0.4241±0.0013 0.0727±0.0005 0.1114±0.0005 0.4388±0.0018 0.8478±0.0046 NeuMF 0.3995±0.0105 0.0658±0.0019 0.1011±0.0026 0.4143±0.0100 0.8176±0.0105 TNF 0.4602±0.0044 0.0781±0.0011 0.1193±0.0013 0.4781±0.0040 0.8662±0.0019 UserTag UCF 0.2524±0.0028 0.0400±0.0005 0.0624±0.0013 0.2619±0.0028 0.5757±0.0093 MF 0.2957±0.0022 0.0456±0.0012 0.0722±0.0015 0.3032±0.0024 0.6146±0.0077 BPR 0.2883±0.0034 0.0439±0.0012 0.0695±0.0016 0.2959±0.0039 0.5978±0.0009 FISM 0.2797±0.0089 0.0413±0.0022 0.0658±0.0031 0.2871±0.0064 0.5686±0.0055 NeuMF 0.2943±0.0076 0.0462±0.0008 0.0731±0.0013 0.3021±0.0086 0.6049±0.0100 TNF 0.3195±0.0018 0.0513±0.0013 0.0802±0.0013 0.3320±0.0030 0.6367±0.0014 Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 19 / 27

slide-20
SLIDE 20

Experiments

Main Results (2/2)

Table: Recommendation performance of UCF, MF, BPR, FISM, NeuMF and

  • ur TNF on five real-world datasets. The significantly best results are marked

in bold.

Dataset Method Prec@5 Rec@5 F1@5 NDCG@5 1-call@5 Netflix5K5K UCF 0.1939±0.0016 0.0657±0.0013 0.0780±0.0008 0.2112±0.0029 0.5221±0.0026 MF 0.2239±0.0029 0.0935±0.0012 0.1056±0.0014 0.2390±0.0046 0.6125±0.0050 BPR 0.2488±0.0030 0.0919±0.0013 0.1075±0.0013 0.2650±0.0040 0.6138±0.0034 FISM 0.2568±0.0048 0.1033±0.0034 0.1178±0.0027 0.2754±0.0057 0.6521±0.0130 NeuMF 0.2293±0.0078 0.0848±0.0016 0.0987±0.0033 0.2463±0.0077 0.5847±0.0143 TNF 0.2775±0.0008 0.1075±0.0022 0.1235±0.0013 0.3012±0.0023 0.6579±0.0019 XING5K5K UCF 0.0741±0.0012 0.0370±0.0017 0.0386±0.0012 0.0828±0.0014 0.2343±0.0033 MF 0.0720±0.0026 0.0301±0.0013 0.0346±0.0018 0.0773±0.0026 0.2247±0.0086 BPR 0.0674±0.0022 0.0256±0.0017 0.0306±0.0017 0.0714±0.0030 0.2025±0.0060 FISM 0.0835±0.0022 0.0379±0.0009 0.0427±0.0013 0.0898±0.0022 0.2648±0.0095 NeuMF 0.0481±0.0024 0.0166±0.0006 0.0195±0.0009 0.0507±0.0033 0.1347±0.0042 TNF 0.0869±0.0017 0.0407±0.0012 0.0447±0.0008 0.0960±0.0027 0.2689±0.0070 Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 20 / 27

slide-21
SLIDE 21

Experiments

Observations (1/3)

TNF performs significantly better than all the five baselines on all the five evaluation metrics across the five datasets, which clearly shows the effectiveness of our transfer learning solution. TNF performs much better than the neighborhood-based method, i.e., UCF, in all cases, which showcases the effectiveness of the second task of global preference learning in our TNF.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 21 / 27

slide-22
SLIDE 22

Experiments

Observations (2/3)

TNF is considerably better than the typical globalized factorization-based methods, i.e., MF and BPR, in terms of all evaluation metrics, indicating that it is more effective to learn global preferences via leveraging the local knowledge transferred from the first task of neighborhood construction.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 22 / 27

slide-23
SLIDE 23

Experiments

Observations (3/3)

TNF beats the four very strong baseline methods, i.e., MF, BPR, FISM and NeuMF, in all cases, which showcases the merit of our proposed solution in exploiting the complementarity of the neighborhood-based method and the factorization-based method in a unified framework. In particular, TNF performs significantly better than FISM, which shows the usefulness of the local knowledge as exploited in the second task in TNF.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 23 / 27

slide-24
SLIDE 24

Related Work

Related Work

1

Neighborhood-based recommendation

user-oriented methods [Aggarwal et al., 1999] item-oriented methods [Deshpande and Karypis, 2004]

2

Factorization-based recommendation

matrix factorization (MF) [Hu et al., 2008, Johnson, 2014] factored item similarity model (FISM) [Kabbur et al., 2013]

3

Deep learning based recommendation

collaborative denoising auto-encoder (CDAE) [Wu et al., 2016] neural matrix factorization (NeuMF) [He et al., 2017]

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 24 / 27

slide-25
SLIDE 25

Conclusions and Future Work

Conclusions

We study an important collaborative filtering problem with users’

  • ne-class feedback, and design a novel transfer learning solution

called transfer by neighborhood-enhanced factorization (TNF). In our TNF, the local knowledge of neighborhood among users are extracted from the users’ behaviors, which are then transferred to a factorization-based global preference learning task in order to capture the latent representation of users and items better.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 25 / 27

slide-26
SLIDE 26

Conclusions and Future Work

Future Work

For future works, we are interested in studying the complementarity of the knowledge of neighborhood (Nu) and that

  • f the historically observed items (Iu), and in incorporating the

mined knowledge into deep learning frameworks such as stacked denoising auto-encoder and multi-layer neural networks.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 26 / 27

slide-27
SLIDE 27

Thank you

Thank you!

We thank the handling editors and reviewers for their effort and constructive expert comments, and the support of National Natural Science Foundation of China Nos. 61872249, 61502307, 61836005 and 61672358.

Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 27 / 27

slide-28
SLIDE 28

References Aggarwal, C. C., Wolf, J. L., Wu, K.-L., and Yu, P . S. (1999). Horting hatches an egg: A new graph-theoretic approach to collaborative filtering. KDD ’99, pages 201–212. Deshpande, M. and Karypis, G. (2004). Item-based top-n recommendation algorithms. ACM Transactions on Information Systems, 22(1):143–177. Guo, G., Zhang, J., Zhu, F., and Wang, X. (2017). Factored similarity models with social trust for top-n item recommendation. Knowledge-Based Systems, 122(C):17–25. He, X., Liao, L., Zhang, H., Nie, L., Hu, X., and Chua, T.-S. (2017). Neural collaborative filtering. WWW ’17, pages 173–182. Hu, Y., Koren, Y., and Volinsky, C. (2008). Collaborative filtering for implicit feedback datasets. ICDM ’08, pages 263–272. Johnson, C. C. (2014). Logistic matrix factorization for implicit feedback data. NIPS ’14, 27. Kabbur, S., Ning, X., and Karypis, G. (2013). FISM: Factored item similarity models for top-n recommender systems. KDD ’13, pages 659–667. Koren, Y., Bell, R., and Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8):30–37. Pan, W. and Chen, L. (2013). Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 27 / 27

slide-29
SLIDE 29

References GBPR: Group preference based bayesian personalized ranking for one-class collaborative filtering. IJCAI ’13, pages 2691–2697. Rendle, S., Freudenthaler, C., Gantner, Z., and Schmidt-Thieme, L. (2009). BPR: Bayesian personalized ranking from implicit feedback. UAI ’09, pages 452–461. Wu, Y., DuBois, C., Zheng, A. X., and Ester, M. (2016). Collaborative denoising auto-encoders for top-n recommender systems. WSDM ’16, pages 153–162. Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 27 / 27