BERT4Rec : Sequential Recommendation with Bidirectional Encoder - - PowerPoint PPT Presentation

bert4rec sequential recommendation with
SMART_READER_LITE
LIVE PREVIEW

BERT4Rec : Sequential Recommendation with Bidirectional Encoder - - PowerPoint PPT Presentation

BERT4Rec : Sequential Recommendation with Bidirectional Encoder Representations from Transformer Advisor: Jia-Ling Koh Presenter: You-Xiang Chen Source: CIKM19 Data: 2020/04/20 INTRODUCTION Introduction target item Sequential


slide-1
SLIDE 1

BERT4Rec : Sequential Recommendation with Bidirectional Encoder Representations from Transformer

Advisor: Jia-Ling Koh Presenter: You-Xiang Chen Source: CIKM’19 Data: 2020/04/20

slide-2
SLIDE 2

INTRODUCTION

slide-3
SLIDE 3

Introduction

Sequential Recommendation

historical subsequence target item Recommender system

slide-4
SLIDE 4

Motivation & Goal

▪ Unidirectional models often assume a rigidly ordered sequence

  • ver data which is not always true for user behaviors in real-world

applications. Proposing bidirectional self-attention network - BERT4Rec

slide-5
SLIDE 5

Motivation & Goal

▪ Conventional bidirectional models encode each historical subsequence to predict the target item. ▪ This approach is very time and resources consuming since we need to create a new sample for each position in the sequence and predict them separately. Introducing the Cloze task to produce more samples to train a more powerful model.

target item historical subsequence

slide-6
SLIDE 6

METHOD

slide-7
SLIDE 7

Problem Statement

Interaction sequence Sets of user & item Output

slide-8
SLIDE 8

Framework

slide-9
SLIDE 9

Embedding Layer

Input representation

d-dim. 𝒊𝟐

𝟏

position embedding matrix item embedding matrix

slide-10
SLIDE 10

Transformer Layer

Multi-Head Self-Attention Scaled Dot-Product Attention

𝑰𝒎 = [𝒊𝟐

𝒎 , 𝒊𝟑 𝒎 , … , 𝒊𝒖 𝒎]

𝑜 𝑜 projects 𝑰𝒎 into 𝒐 subspaces

slide-11
SLIDE 11

Transformer

slide-12
SLIDE 12

Multi-Head Attention

slide-13
SLIDE 13

Transformer Layer

Position-wise Feed-Forward Network

Gaussian Error Linear Unit (GELU) activation function separately and identically at each position

slide-14
SLIDE 14

Gaussian Error Linear Units

https://arxiv.org/pdf/1606.08415.pdf

slide-15
SLIDE 15

Transformer Layer

Stacking Transformer Layer

LN(·) : layer normalization function

https://arxiv.org/pdf/1607.06450.pdf

slide-16
SLIDE 16

Output Layer

𝑿𝑸: 𝑴𝒇𝒃𝒐𝒃𝒄𝒎𝒇 𝒒𝒔𝒑𝒌𝒇𝒅𝒖𝒋𝒑𝒐 𝒏𝒃𝒖𝒔𝒋𝒚 𝑭: 𝑭𝒏𝒄𝒇𝒆𝒆𝒋𝒐𝒉 𝑵𝒃𝒖𝒔𝒋𝒚 𝒑𝒈 𝒋𝒖𝒇𝒏𝒕

slide-17
SLIDE 17

Model Learning

𝒏𝒃𝒕𝒍𝒇𝒆 𝒘𝒇𝒔𝒕𝒋𝒑𝒐 𝒈𝒑𝒔 𝒗𝒕𝒇𝒔 𝒄𝒇𝒊𝒃𝒘𝒋𝒑𝒔 𝒖𝒊𝒇 𝒏𝒃𝒕𝒍𝒇𝒆 𝒋𝒖𝒇𝒏𝒕

slide-18
SLIDE 18

EXPERIMENT

slide-19
SLIDE 19

Datasets

slide-20
SLIDE 20

Baselines

▪ POP ▪ BPR-MF ▪ NCF ▪ FPMC ▪ GRU4Rec + ▪ Caser ▪ SASRec

markov chain 𝒐𝒑𝒐 − 𝒕𝒇𝒓𝒗𝒇𝒐𝒖𝒋𝒃𝒎 𝒕𝒇𝒓𝒗𝒇𝒐𝒖𝒋𝒃𝒎

slide-21
SLIDE 21

Evaluation metrics

𝐼𝑆@𝐿 = 𝑂𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝐼𝑗𝑢𝑡 @ 𝐿 𝐻𝑈

Hit Ratio

𝐸𝐷𝐻𝑙 = ෍

𝑗=1 𝑙

2𝑠𝑓𝑚𝑗 − 1 log2(𝑗 + 1) 𝑂𝐸𝐷𝐻@𝐿 = 𝐸𝐷𝐻@𝐿 𝐽𝐸𝐷𝐻

Normalized Discounted cumulative gain Mean Reciprocal Rank

𝑁𝑆𝑆 = 1 𝑅 ෍

𝑗=1 𝑅

1 𝑠𝑏𝑜𝑙𝑗

slide-22
SLIDE 22

Performance

𝒙𝒑𝒔𝒕𝒖 𝒐𝒑𝒐 − 𝒕𝒇𝒓𝒗𝒇𝒐𝒖𝒋𝒃𝒎 𝒕𝒇𝒓𝒗𝒇𝒐𝒖𝒋𝒃𝒎 𝑼𝒔𝒃𝒐𝒕𝒈𝒑𝒔𝒏𝒇𝒔

slide-23
SLIDE 23

Analysis on Bidirection and Cloze

slide-24
SLIDE 24

CONCLUSION

▪ We introduce a deep bidirectional sequential model called BERT4Rec for sequential recommendation. ▪ For model training, we introduce the Cloze task which predicts the masked items using both left and right context. ▪ Extensive experimental results on four real-world datasets show that our model

  • utperforms state-of-the-art baselines.