Bayesian Personalized Feature Interaction Selection for - - PowerPoint PPT Presentation

bayesian personalized feature interaction selection for
SMART_READER_LITE
LIVE PREVIEW

Bayesian Personalized Feature Interaction Selection for - - PowerPoint PPT Presentation

Bayesian Personalized Feature Interaction Selection for Factorization Machines Yifan Chen 1,2 Pengjie Ren 1 Yang Wang 3 Maarten de Rijke 1 1 University of Amsterdam 2 National University of Defense Technology 3 Hefei University of Technology Yifan


slide-1
SLIDE 1

Bayesian Personalized Feature Interaction Selection for Factorization Machines

Yifan Chen1,2 Pengjie Ren1 Yang Wang3 Maarten de Rijke1

1University of Amsterdam 2National University of Defense Technology 3Hefei University of Technology Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 1 / 28

slide-2
SLIDE 2

Introduction Factorization Machines Feature Interaction Selection

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 2 / 28

slide-3
SLIDE 3

Factorization Machines

What is Factorization Machine?

◮ generic supervised learning method ◮ account for feature interactions with factored parameters

◮ the combination of features

#Hashtag “comics” “marvel” “avengers” Feature combinations (“comics”,“marvel”) (“comics”,“avengers”) (“marvel”,“avengers”)

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 3 / 28

slide-4
SLIDE 4

Factorization Machines

◮ Linear regression: O(d) ˆ r(x) = b0 +

d

  • i=1

wixi

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 4 / 28

slide-5
SLIDE 5

Factorization Machines

◮ Linear regression: O(d) ˆ r(x) = b0 +

d

  • i=1

wixi ◮ Degree-2 polynomial regression: O(d2) ˆ r(x) = b0 +

d

  • i=1

wixi +

d

  • i=1

d

  • j=i+1

wij · xixj

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 4 / 28

slide-6
SLIDE 6

Factorization Machines

◮ Linear regression: O(d) ˆ r(x) = b0 +

d

  • i=1

wixi ◮ Degree-2 polynomial regression: O(d2) ˆ r(x) = b0 +

d

  • i=1

wixi +

d

  • i=1

d

  • j=i+1

wij · xixj ◮ Factorization machine: O(dk) ˆ r(x) = b0 +

d

  • i=1

wixi +

d

  • i=1

d

  • j=i+1

vi, vj · xixj

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 4 / 28

slide-7
SLIDE 7

Factorization Machines

Example

ˆ r(spider-man) =b0 + wcomics + wmarvel + wavengers+ vcomics, vmarvel + vcomics, vavengers + vmarvel, vavengers #Hashtag “comics” “marvel” “avengers” Feature combinations (“comics”,“marvel”) (“comics”,“avengers”) (“marvel”,“avengers”)

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 5 / 28

slide-8
SLIDE 8

Introduction Factorization Machines Feature Interaction Selection

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 6 / 28

slide-9
SLIDE 9

Factorization Machines for Recommendation

◮ Effective use of historical interactions between users and items ◮ Incorporate additional information associated with users or items ◮ High-dimensional feature space

◮ #feature = #user + #item + #additional ◮ not all features or feature interactions are helpful

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 7 / 28

slide-10
SLIDE 10

Factorization Machines for Recommendation

◮ Effective use of historical interactions between users and items ◮ Incorporate additional information associated with users or items ◮ High-dimensional feature space

◮ #feature = #user + #item + #additional ◮ not all features or feature interactions are helpful

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 7 / 28

slide-11
SLIDE 11

Feature Interaction Selection (FIS)

Filter out useless feature interactions ◮ P-FIS: Select feature interactions for users personally ◮ FIS: select a common set of interactions

x1 x2 x3 x4 x1 · x2 x1 · x3 x1 · x4 x2 · x3 x2 · x4 x3 · x4 x1 · x3 x1 · x4 x2 · x4 FM FIS for u1 and u2

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 8 / 28

slide-12
SLIDE 12

Feature Interaction Selection (FIS)

Filter out useless feature interactions ◮ P-FIS: Select feature interactions for users personally

x1 x2 x3 x4 x1 · x2 x1 · x3 x1 · x4 x2 · x3 x2 · x4 x3 · x4 x1 · x3 x1 · x3 x2 · x3 x2 · x3 x2 · x4 x2 · x4 FM FM P-FIS for u1 P-FIS for u2

◮ FIS: select a common set of interactions

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 8 / 28

slide-13
SLIDE 13

Introduction Factorization Machines Feature Interaction Selection Model description Bayesian personalized feature interaction selection Efficient optimization

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 9 / 28

slide-14
SLIDE 14

Personalized Factorization Machines (PFM)

FM

ˆ r(x) = b0 +

d

  • i=1

wixi +

d

  • i=1

d

  • j=i+1

wij · xixj

PFM

ˆ r(x) = bu +

d

  • i=1

wuixi +

d

  • i=1

d

  • j=i+1

wuij · xixj Select 1st-order interactions {xi} and 2nd-order interactions {xixj} by {wui} and {wuij}

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 10 / 28

slide-15
SLIDE 15

Bayesian Variable Selection (BVS)

◮ Apply BVS to select feature interactions

◮ avoid expensive cross-validation

◮ Priors for BVS

◮ sparsity priors ◮ spike-and-slab

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 11 / 28

slide-16
SLIDE 16

Bayesian Variable Selection

Spike-and-slab

◮ Spike (black arrow): p(w = 0) = 0.5 ◮ Slab (blue line)

Sparsity priors

◮ f (w) =

1 2b exp

  • − |x−µ|

b

  • ◮ p(w = 0) = 0

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 12 / 28

slide-17
SLIDE 17

Hereditary Spike-and-Slab Priors

◮ Spike-and-slab s ∼ Bernoulli(π), ˜ w ∼ N(0, 1), w = ˜ w · s. ◮ Hereditary spike-and-slab

◮ capture the relations between 1st-order and 2nd-order feature interactions sui, suj ∼ Bernoulli(π1) p(suij = 1 | suisuj = 1) = 1 (Strong heredity) p(suij = 1 | sui + suj = 1) = π2 (Weak heredity) p(suij = 1 | sui + suj = 0) = 0

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28

slide-18
SLIDE 18

Hereditary Spike-and-Slab Priors

◮ Spike-and-slab s ∼ Bernoulli(π), ˜ w ∼ N(0, 1), w = ˜ w · s. ◮ Hereditary spike-and-slab

◮ capture the relations between 1st-order and 2nd-order feature interactions sui, suj ∼ Bernoulli(π1) p(suij = 1 | suisuj = 1) = 1 (Strong heredity) p(suij = 1 | sui + suj = 1) = π2 (Weak heredity) p(suij = 1 | sui + suj = 0) = 0

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28

slide-19
SLIDE 19

Hereditary Spike-and-Slab Priors

◮ Spike-and-slab s ∼ Bernoulli(π), ˜ w ∼ N(0, 1), w = ˜ w · s. ◮ Hereditary spike-and-slab

◮ capture the relations between 1st-order and 2nd-order feature interactions sui, suj ∼ Bernoulli(π1) p(suij = 1 | suisuj = 1) = 1 (Strong heredity) p(suij = 1 | sui + suj = 1) = π2 (Weak heredity) p(suij = 1 | sui + suj = 0) = 0

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28

slide-20
SLIDE 20

Hereditary Spike-and-Slab Priors

◮ Spike-and-slab s ∼ Bernoulli(π), ˜ w ∼ N(0, 1), w = ˜ w · s. ◮ Hereditary spike-and-slab

◮ capture the relations between 1st-order and 2nd-order feature interactions sui, suj ∼ Bernoulli(π1) p(suij = 1 | suisuj = 1) = 1 (Strong heredity) p(suij = 1 | sui + suj = 1) = π2 (Weak heredity) p(suij = 1 | sui + suj = 0) = 0

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28

slide-21
SLIDE 21

Hereditary Spike-and-Slab Priors

◮ Spike-and-slab s ∼ Bernoulli(π), ˜ w ∼ N(0, 1), w = ˜ w · s. ◮ Hereditary spike-and-slab

◮ capture the relations between 1st-order and 2nd-order feature interactions sui, suj ∼ Bernoulli(π1) p(suij = 1 | suisuj = 1) = 1 (Strong heredity) p(suij = 1 | sui + suj = 1) = π2 (Weak heredity) p(suij = 1 | sui + suj = 0) = 0

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28

slide-22
SLIDE 22

Generative Procedure of BP-FIS

Algorithm Generation procedure

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

slide-23
SLIDE 23

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do 2:

for each feature i ∈ F do

3:

draw first-order interaction selection variable sui ∼ Bernoulli(π1)

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

slide-24
SLIDE 24

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do 2:

for each feature i ∈ F do

3:

draw first-order interaction selection variable sui ∼ Bernoulli(π1)

4:

draw first-order interaction weight ˜ wi ∼ N(0, 1)

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

slide-25
SLIDE 25

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do 2:

for each feature i ∈ F do

3:

draw first-order interaction selection variable sui ∼ Bernoulli(π1)

4:

draw first-order interaction weight ˜ wi ∼ N(0, 1)

5:

wui = sui · ˜ wi

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

slide-26
SLIDE 26

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do 2:

for each feature i ∈ F do

3:

draw first-order interaction selection variable sui ∼ Bernoulli(π1)

4:

draw first-order interaction weight ˜ wi ∼ N(0, 1)

5:

wui = sui · ˜ wi

6:

for each feature pair i, j ∈ F do

7:

draw second-order interaction selection variable suij ∼ p(suij | sui, suj)

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

slide-27
SLIDE 27

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do 2:

for each feature i ∈ F do

3:

draw first-order interaction selection variable sui ∼ Bernoulli(π1)

4:

draw first-order interaction weight ˜ wi ∼ N(0, 1)

5:

wui = sui · ˜ wi

6:

for each feature pair i, j ∈ F do

7:

draw second-order interaction selection variable suij ∼ p(suij | sui, suj)

8:

draw second-order interaction weight ˜ wij ∼ N(0, 1)

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

slide-28
SLIDE 28

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do 2:

for each feature i ∈ F do

3:

draw first-order interaction selection variable sui ∼ Bernoulli(π1)

4:

draw first-order interaction weight ˜ wi ∼ N(0, 1)

5:

wui = sui · ˜ wi

6:

for each feature pair i, j ∈ F do

7:

draw second-order interaction selection variable suij ∼ p(suij | sui, suj)

8:

draw second-order interaction weight ˜ wij ∼ N(0, 1)

9:

wuij = suij · ˜ wij

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

slide-29
SLIDE 29

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do 2:

for each feature i ∈ F do

3:

draw first-order interaction selection variable sui ∼ Bernoulli(π1)

4:

draw first-order interaction weight ˜ wi ∼ N(0, 1)

5:

wui = sui · ˜ wi

6:

for each feature pair i, j ∈ F do

7:

draw second-order interaction selection variable suij ∼ p(suij | sui, suj)

8:

draw second-order interaction weight ˜ wij ∼ N(0, 1)

9:

wuij = suij · ˜ wij

10: for each feature vector x ∈ X do 11:

calculate the rating prediction ˆ r(x) by PFM

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

slide-30
SLIDE 30

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do 2:

for each feature i ∈ F do

3:

draw first-order interaction selection variable sui ∼ Bernoulli(π1)

4:

draw first-order interaction weight ˜ wi ∼ N(0, 1)

5:

wui = sui · ˜ wi

6:

for each feature pair i, j ∈ F do

7:

draw second-order interaction selection variable suij ∼ p(suij | sui, suj)

8:

draw second-order interaction weight ˜ wij ∼ N(0, 1)

9:

wuij = suij · ˜ wij

10: for each feature vector x ∈ X do 11:

calculate the rating prediction ˆ r(x) by PFM

12:

draw r(x) ∼ p(r | ˆ r(x))

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

slide-31
SLIDE 31

Introduction Factorization Machines Feature Interaction Selection Model description Bayesian personalized feature interaction selection Efficient optimization

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 15 / 28

slide-32
SLIDE 32

Optimization

Maximum A Posteriori: arg max ˜

W ,S p( ˜

W , S | R, X)

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 16 / 28

slide-33
SLIDE 33

Optimization

Maximum A Posteriori: arg max ˜

W ,S p( ˜

W , S | R, X)

Infeasible exact inference

◮ space complexity: O(md2) ◮ time complexity: O(2md2)

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 16 / 28

slide-34
SLIDE 34

Optimization

Maximum A Posteriori: arg max ˜

W ,S p( ˜

W , S | R, X)

Infeasible exact inference

◮ space complexity: O(md2) ◮ time complexity: O(2md2)

Variational inference

◮ approximate p( ˜ W , S | R, X) by q( ˜ W , S)

◮ space complexity: O(md)

◮ Stochastic Gradient Variational Bayes (SGVB)

◮ time complexity: O(dk), same as FMs

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 16 / 28

slide-35
SLIDE 35

Introduction Factorization Machines Feature Interaction Selection Model description Bayesian personalized feature interaction selection Efficient optimization Experiment

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 17 / 28

slide-36
SLIDE 36

Experimental Setup

Datasets

HetRec: Information Heterogeneity and Fusion in Recommender Systems ◮ MovieLens: rating and tagging ◮ LastFM: rating, tagging, social networking ◮ Delicious: rating, tagging, social networking

Baselines

◮ Rendle (2010): Factorization Machine (FM) ◮ Sparse Factorization Machine (SFM) ◮ Xiao et al. (2017): Attentional Factorization Machine (AFM) ◮ He and Chua (2017): Neural Factorization Machine (NFM)

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 18 / 28

slide-37
SLIDE 37

Experimental Setup

Our methods

apply BP-FIS to a linear FM and a non-linear FM ◮ BP-FM ◮ BP-NFM

Evaluation

Top-N recommendation ◮ Leave-One-Out-Cross-Validation (LOOCV) ◮ ranking among 100 items ◮ metrics: HR@N and ARHR@N

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 19 / 28

slide-38
SLIDE 38

Overall Performance

Table: Delicious

Method HR@1 HR@10 ARHR@10 FM 0.0202 0.1147 0.0440 SFM 0.0229 0.1212 0.0465 AFM 0.0274 0.1169 0.0494 BP-FM 0.0278 0.1240** 0.0509* NFM 0.0229 0.1065 0.0426 BP-NFM 0.0268 0.1289** 0.0504**

∗ and ∗∗ indicate that the best score is signif-

icantly better than the second best score with p < 0.1 and p < 0.05, respectively.

◮ SFM outperforms FM and AFM

  • n HR@10: need for FIS

◮ BP-FM and BP-NFM significantly

  • utperforms FMs and NFM,

respectively: effect of P-FIS

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 20 / 28

slide-39
SLIDE 39

Impact of Embedding Size

k=64 k=128 k=256 0.0 0.2 0.4 0.6

MovieLens HR@10

AFM FM SFM NFM BP−FM BP−NFM

◮ k = 64: P-FIS has insignificant effect of FMs ◮ k = 128, 256:

◮ BP-FM and BP-NFM significantly outperform FMs and NFM ◮ BP-NFM does not outperform BP-FM

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 21 / 28

slide-40
SLIDE 40

Case study

#hashtag “action” “buddy” “comedy” “sequel” “action” “buddy” “comedy” “sequel”

Figure: User 1, top-1 recommendation

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 22 / 28

slide-41
SLIDE 41

Case study

#hashtag “action” “buddy” “comedy” “sequel” “action” “buddy” “comedy” “sequel”

Figure: User 2, top-5 recommendation

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 23 / 28

slide-42
SLIDE 42

Case study

#hashtag “action” “buddy” “comedy” “sequel” “action” “buddy” “comedy” “sequel”

Figure: User 3, not recommended

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 24 / 28

slide-43
SLIDE 43

Introduction Factorization Machines Feature Interaction Selection Model description Bayesian personalized feature interaction selection Efficient optimization Experiment Conclusion

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 25 / 28

slide-44
SLIDE 44

Conclusion

  • 1. We study personalized feature interaction selection (P-FIS) for Factorization

Machines.

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 26 / 28

slide-45
SLIDE 45

Conclusion

  • 1. We study personalized feature interaction selection (P-FIS) for Factorization

Machines.

  • 2. We propose a Bayesian personalized feature interaction selection (BP-FIS)

method based on the Bayesian variable selection.

◮ We propose hereditary spike-and-slab as priors to achieve P-FIS. ◮ BP-FIS is a plug-and-play framework for FMs

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 26 / 28

slide-46
SLIDE 46

Conclusion

  • 1. We study personalized feature interaction selection (P-FIS) for Factorization

Machines.

  • 2. We propose a Bayesian personalized feature interaction selection (BP-FIS)

method based on the Bayesian variable selection.

◮ We propose hereditary spike-and-slab as priors to achieve P-FIS. ◮ BP-FIS is a plug-and-play framework for FMs

  • 3. We design an efficient optimization algorithm based on Stochastic Gradient

Variational Bayes (SGVB).

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 26 / 28

slide-47
SLIDE 47

Future Work

  • 1. Extend BP-FIS to select higher-order feature interactions
  • 2. Consider group-level personalization via clustering to speed up training

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 27 / 28

slide-48
SLIDE 48

Thank You

Source code

https://github.com/yifanclifford/BP-FIS

Contact

◮ Yifan Chen (https://sites.google.com/view/yifanchenuva/home) ◮ y.chen4@uva.nl

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 28 / 28

slide-49
SLIDE 49

Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for Sparse Predictive Analytics. In SIGIR. ACM, 355–364. Steffen Rendle. 2010. Factorization Machines. In ICDM. IEEE, 995–1000. Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua. 2017. Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks. In IJCAI. 3119–3125.

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 28 / 28