[PPT] - Dual Similarity Learning for Heterogeneous One-Class Collaborative PowerPoint Presentation

SLIDE 1

Dual Similarity Learning for Heterogeneous One-Class Collaborative Filtering

Xiancong Chen, Weike Pan, Zhong Ming

National Engineering Laboratory for Big Data System Computing Technology, Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China, chenxiancong@email.szu.edu.cn, {panweike,mingz}@szu.edu.cn

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 1 / 21

SLIDE 2

Introduction

Problem Definition

Problem: In this paper, we study the heterogeneous one-class collaborative filtering (HOCCF) problem. Input: For a user u ∈ U, we have a set of purchased items, i.e., IP

u , and a set of examined items, i.e., IE u .

Goal: Our goal is to exploit such two types of one-class feedback and recommend a ranked list of items for each user u.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 2 / 21

SLIDE 3

Introduction

Challenges

1

The sparsity of target feedback.

2

The ambiguity and noise of auxiliary feedback.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 3 / 21

SLIDE 4

Introduction

Overall of Our Solution

Target Item Target User

Dual Similarity i'i

s

ji

s

u'u

s

wu

s

Dual similarity learning model (DSLM): Learn the similarity si′i between a target item i and a purchased item i′, and the similarity sji between a target item i and an examined item j. Learn the similarity su′u between a target user u and a user u′ who purchased item i, and the similarity swu between a target item u and a user w who examined item i.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 4 / 21

SLIDE 5

Introduction

Advantages of Our Solution

By introducing the auxiliary feedback, DSLM is able to alleviate the sparsity problem to some extent. DSLM learns not only the similarity among items, but also the similarity among users, which is useful to capture the correlations between users and items. DSLM strikes a good balance between the item-based similarity and user-based similarity.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 5 / 21

SLIDE 6

Introduction

Notations

Notation Explanation n number of users m number of items u, u′, w ∈ {1, 2, . . . , n} user ID i, i′, j ∈ {1, 2, . . . , m} item ID U = {u}, |U| = n the whole set of users I = {i}, |I| = m the whole set of items RP = {(u, i)} the whole set of purchases RE = {(u, i)} the whole set of examinations RA = {(u, i)} the set of absent pairs IP

u

= {i|(u, i) ∈ RP } the set of purchased items w.r.t. u IE

u = {i|(u, i) ∈ RE}

the set of examined items w.r.t. u UP

i

= {u|(u, i) ∈ RP } the set of users that have purchased item i UE

i

= {u|(u, i) ∈ RE} the set of users that have examined item i Uu·, Pu′·, Ew· ∈ R1×d user’s latent vectors Vi·, ˜ Pi′ , ˜ Ej· ∈ R1×d item’s latent vectors bu, bi ∈ R user bias and item bias d latent feature number ˆ rui predicted preference of user u on item i ρ sampling parameter γ learning rate T, L, L0 iteration number λ∗, α∗, β∗ tradeoff parameters Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 6 / 21

SLIDE 7

Background

Factored Item Similarity Model (FISM)

In FISM, we can estimate the preference of user u towards item i by aggregating the similarity between item i and all of its neighbors (i.e., IP

u \{i}), which is shown as follows,

ˆ rui = 1

|IP

u \{i}|

i′∈IP

u \{i}

˜ Pi′·V T

i· ,

(1) where we can regard the term

1

√

|IP

u \{i}|

i′∈IP

u \{i} ˜

Pi′· as a certain virtual user profile w.r.t. the target feedback, denoting the distinct preference of user u.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 7 / 21

SLIDE 8

Background

Transfer via Joint Similarity Learning (TJSL)

TJSL introduces a new similarity term in a similar way to that of FISM, through which the knowledge from the auxiliary feedback (e.g., examination actions) can be transferred. Then, the preference estimation of user u towards item i is as follows,

i′∈IP

u \{i}

si′i +

j∈IE(ℓ)

u

sji, IE(ℓ)

u

⊆ IE

u ,

(2) where

j∈IE(ℓ)

u

sji =

1

|IE(ℓ)

u

|

j∈IE(ℓ)

u

˜ Ej·V T

i· , and the term 1

|IE(ℓ)

u

|

j∈IE(ℓ)

u

˜ Ej· can be regarded as a virtual user profile w.r.t. the auxiliary feedback.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 8 / 21

SLIDE 9

Method

Dual Similarity Learning Model (DSLM)

In TJSL, two similarity among items are learned. Symmetrically, we define the similarity among users as follows,

u′∈UP

i \{u}

su′u +

w∈UE(ℓ)

i

swu, UE(ℓ)

i

⊆ UE

i ,

(3) where

u′∈UP

i \{u} su′u =

1

√

|UP

i \{u}|

u′∈UP

i \{u} Pu′·UT

u·,

w∈UE(ℓ)

i

swu =

1

|UE(ℓ)

i

|

w∈UE(ℓ)

i

Ew·UT

u·. Intuitively, we can also regard

the term

1

√

|UP

i \{u}|

u′∈UP

i \{u} Pu′· and

1

|UE(ℓ)

i

|

w∈UE(ℓ)

i

Ew· as the virtual item profiles w.r.t. the target feedback and auxiliary feedback, respectively.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 9 / 21

SLIDE 10

Method

Prediction Rule

The predicted preference of user u on item i, ˆ r (ℓ)

ui =

i′∈IP

u \{i}

si′i +

j∈IE(ℓ)

u

sji + bu + bi+

u′∈UP

i \{u}

su′u +

w∈UE(ℓ)

i

swu, IE(ℓ)

u

⊆ IE

u , UE(ℓ) i

⊆ UE

i .

(4) where IE(ℓ)

u

is the set of likely-to-prefer items selected from IE

u , UE(ℓ) i

is the set of potential users that are likely to purchase item i selected from UE

i .

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 10 / 21

SLIDE 11

Method

Objective Function

The objective function of DSLM is as follows, min

Θ(ℓ), IE(ℓ)

u

⊆IE

u , UE(ℓ) i

⊆UE

i

(u,i)∈RP∪RA

f (ℓ)

ui

(5) where f (ℓ)

ui

= 1

2(rui − ˆ

r (ℓ)

ui )2 + λu 2 ||Uu·||2 F + λp 2

u′∈UP

i \{u} ||Pu′·||2

F+ λe 2

w∈UE(ℓ)

i

||Ew·||2

F + βu 2 b2 u + αv 2 ||Vi·||2 F + αp 2

i′∈IP

u \{i} ||˜

Pi′·||2

F + αe 2

j∈IE(ℓ)

u

||˜ Ej·||2

F + βv 2 b2 i , and the model parameters are

Θ(ℓ) = {Uu·, Pu′·, Ew·, Vi·, ˜ Pi′·, ˜ Ej·, bu, bi}. Note that RA is the set of negative feedback used to complement the target feedback, where rui = 1 if (u, i) ∈ RP and rui = 0 otherwise.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 11 / 21

SLIDE 12

Method

Gradients (1/2)

To learn the parameters Θ(ℓ), we use the stochastic gradient decent (SGD) algorithm and have the gradients of the model parameters for a randomly sampled pair (u, i) ∈ RP ∪ RA, ∇Uu· = −eui 1

|UP

i \{u}|

u′∈UP

i \{u}

Pu′· − eui 1

|UE(ℓ)

i

|

w∈UE(ℓ)

i

Ew· + λuUu·, (6) ∇Vi· = −eui 1

|IP

u \{i}|

i′∈IP

u \{i}

˜ Pi′· − eui 1

|IE(ℓ)

u

|

j∈IE(ℓ)

u

˜ Ej· + αvVi·, (7) ∇Pu′· = −eui 1

|UP

i \{u}|

Uu· + λpPu′·, u′ ∈ UP

i \{u},

(8)

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 12 / 21

SLIDE 13

Method

Gradients (2/2)

∇Ew· = − eui 1

|UE(ℓ)

i

| Uu· + λeEw·, w ∈ UE(ℓ)

i

, (9) ∇˜ Pi′· = − eui 1

|IP

u \{i}|

Vi· + αp ˜ Pi′·, i′ ∈ IP

u \{i},

(10) ∇˜ Ej· = − eui 1

|IE(ℓ)

u

| Vi· + αe ˜ Ej·, j ∈ IE(ℓ)

u

, (11) ∇bu = − eui + βubu, ∇bi = −eui + βvbi, (12) where eui = rui − ˆ rui is the difference between the true preference and the predicted preference.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 13 / 21

SLIDE 14

Method

Update Rules

We have the update rules, θ(ℓ) ← θ(ℓ) − γ∇θ(ℓ), (13) where γ is the learning rate, and θ(ℓ) ∈ Θ(ℓ) is a model parameter to be learned.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 14 / 21

SLIDE 15

Method

Identification of IE(ℓ)

u

and UE(ℓ)

i

Note that we identify UE(ℓ)

i

and IE(ℓ)

u

by the following way, For each user u ∈ UE

i , we estimate the preference for the target

item i, i.e., ˆ r (ℓ)

ui , and take τ|UE i |(τ ∈ (0, 1]) users with the highest

scores as the potential users that are likely to purchase the target item i. For each j ∈ IE

u , similarly, we estimate the preference ˆ

r (ℓ)

uj , and

take τ|IE

u | items with the highest scores as the candidate items.

Finally, we save the model and data of the last L0 epochs. The estimated preference is the average value of ˆ r (ℓ)

ui , where ℓ ranges from

L − L0 + 1 to L.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 15 / 21

SLIDE 16

Method

Algorithm of DSLM

1: Input: RP , RE, T, L, L0, ρ, γ, λ∗, α∗, β∗. 2: Output: UE(ℓ), IE(ℓ) and Θ(ℓ), ℓ = L − L0 + 1, ..., L. 3: Let UE(1) = UE, IE(1) = IE, τ = 1 4: for ℓ = 1, ..., L do 5:

Initialize the model Θ(ℓ)

6:

for t = 1, ..., T do

7:

Randomly sample RA ⊂ R\RP with |RA| = ρ|RP |

8:

for t2 = 1, ..., |RP ∪ RA| do

9:

Randomly pick up (u, i) ∈ RP ∪ RA

10:

Calculate ˆ r(ℓ)

ui

via (4)

11:

Calculate ∇θ, θ ∈ Θ(ℓ) via (6)−(12)

12:

Update θ, θ ∈ Θ(ℓ) via (13)

13:

end for

14:

end for

15:

if ℓ > L − L0 then

16:

Save the current model and data (Θ(ℓ), UE(ℓ), IE(ℓ))

17:

end if

18:

if L > 1 and L > ℓ then

19:

τ ← τ × 0.9

20:

Select IE(ℓ+1)

u

with |IE(ℓ+1)

u

| = τ|IE

u | for each u

21:

Select UE(ℓ+1)

i

with |UE(ℓ+1)

i

| = τ|UE

i | for each i

22:

end if

23: end for

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 16 / 21

SLIDE 17

Experiments

Datasets and Evaluation Metrics

For direct comparison, we use the same datasets as TJSL, including ML100K, ML1M and Alibaba2015.

Table: Description of the datasets used in the experiments.

Dataset |U| |I| |RP| |RE| |RPte| ML100K 943 1682 9438 45285 2153 ML1M 6040 3952 90848 400083 45075 Alibaba2015 7475 5257 9290 62659 2322 For performance evaluation, we adopt two commonly used ranking-oriented metrics, i.e., Precision@5 and NDCG@5.

The data and code are available at http://csse.szu.edu.cn/staff/panwk/publications/DSLM/

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 17 / 21

SLIDE 18

Experiments

Baselines and Parameter Configurations

For comparative studies, we include the competitive methods as follows.

BPR (Bayesian personalized ranking) FISM (Factored item similarity model) TJSL (Transfer via joint similarity learning) RBPR (Role-based Bayesian personalized ranking)

For parameter configurations, we fix the number of latent dimension d = 20, the learning rate γ = 0.01 and sampling parameter ρ = 3, and search the tradeoff parameters from {0.001, 0.01, 0.1} and the best iteration number T from {100, 500, 1000} via NDCG@5 performance.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 18 / 21

SLIDE 19

Experiments

Main Results

Dataset BPR FISM TJSL RBPR DSLM ML100K Precision@5 0.0552± 0.0006 0.0628± 0.0015 0.0697± 0.0016 0.0654± 0.0013 0.0694± 0.0014 NDCG@5 0.0874± 0.0020 0.1029± 0.0017 0.1133± 0.0047 0.1058± 0.0047 0.1140± 0.0022 ML1M Precision@5 0.0928± 0.0008 0.0971± 0.0013 0.1012± 0.0011 0.1086± 0.0009 0.1107± 0.0015 NDCG@5 0.1121± 0.0010 0.1189± 0.0008 0.1248± 0.0010 0.1327± 0.0016 0.1353± 0.0012 Alibaba2015 Precision@5 0.0050± 0.0006 0.0046± 0.0003 0.0071± 0.0004 0.0076± 0.0005 0.0087± 0.0006 NDCG@5 0.0138± 0.0017 0.0126± 0.0009 0.0200± 0.0008 0.0220± 0.0013 0.0269± 0.0017

Observations: In all cases, TJSL, RBPR and DSLM perform significantly better than BPR and FISM, which shows the effectiveness of introducing the auxiliary feedback to assist the task of learning users’ preferences; and DSLM performs better than TJSL and RBPR in most cases, e.g., it is the best on ML1M and Alibaba2015, and is comparable with TJSL on ML100K, which clearly shows the usefulness of the dual similarity in capturing the correlations between users and items.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 19 / 21

SLIDE 20

Conclusion

We propose a novel solution, i.e., dual similarity learning model (DSLM), for a recent and important recommendation problem called heterogeneous one-class collaborative filtering (HOCCF). In particular, we jointly learn the dual similarity among both users and items so as to exploit the complementarity well. Extensive empirical studies on three public datasets clearly show the effectiveness of our solution.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 20 / 21

SLIDE 21

Thank you

Thank you!

We thank the anonymous reviewers for their expert and constructive comments and suggestions. We thank the support of National Natural Science Foundation of China Nos. 61872249, 61502307, 61836005 and 61672358.

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 21 / 21