Deep Learning Based Recommendation Systems: Part 2 Prof. Srijan - - PowerPoint PPT Presentation

deep learning based recommendation systems part 2
SMART_READER_LITE
LIVE PREVIEW

Deep Learning Based Recommendation Systems: Part 2 Prof. Srijan - - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Deep Learning Based Recommendation Systems: Part 2 Prof. Srijan Kumar http://cc.gatech.edu/~srijan 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Announcements


slide-1
SLIDE 1

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

1

CSE 6240: Web Search and Text Mining. Spring 2020

Deep Learning Based Recommendation Systems: Part 2

  • Prof. Srijan Kumar

http://cc.gatech.edu/~srijan

slide-2
SLIDE 2

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

2

Announcements

  • HW3:

– Deadline extended to Wednesday upon multiple requests. – Sample output released on Piazza. Please check your submissions.

  • Project:

– Milestone grades released. Grade queries to be submitted by Tuesday night.

  • Very nicely written reports!

– Good luck for the final presentation and report!

slide-3
SLIDE 3

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

3

Today’s Lecture

  • Introduction
  • Neural Collaborative Filtering
  • RRN
  • JODIE
slide-4
SLIDE 4

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

4

RRN

  • RRN = Recurrent Recommender

Networks

  • One of the first methods to model the

temporal evolution of user and item behavior

  • Reference paper: Recurrent Recommender
  • Networks. CY Wu, A Ahmed, A Beutel, A

Smola, H Jing. WSDM 2017

slide-5
SLIDE 5

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

5

Traditional Methods

  • Existing models assume user and item

states are stationary

– States = embeddings, hidden factors, representations

  • However, user preferences and item states

change over time

  • How to model this?
  • Key idea: use of RNNs to learn evolution of

user embeddings

slide-6
SLIDE 6

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

6

User Preferences

  • User preference changes over time

?

10 years ago now

slide-7
SLIDE 7

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

7

Item States

  • Movie reception changes over time

So bad that it’s great to watch Bad movie

slide-8
SLIDE 8

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

8

Exogenous Effects

“La La Land” won big at Golden Globes

slide-9
SLIDE 9

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

9

Seasonal Effects

Only watch during Christmas

slide-10
SLIDE 10

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

10

Traditional Methods

  • Traditional matrix factorization,

including NCF, assumes user state ui and item state mj are fixed and independent of each

  • ther
  • Use both to make predictions

about the rating score rij

  • Right figure: latent variable

block diagram of traditional MF

slide-11
SLIDE 11

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

11

RRN Framework

  • RRN innovates by modeling

temporal dynamics within each user state ui and movie state mj

  • uit depends on uit- and

influences uit+

– Same for movies

  • User and item states are

independent of each other

slide-12
SLIDE 12

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

12

Model Learning Setting

  • Actions are happening over time
  • How to split training and testing data to

respect the time dependency?

slide-13
SLIDE 13

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

13

Traditional Random Split: N/A

  • Random train/test split violates the temporal

dependency

– Future actions can be in train, while past actions can be in test

? ? ?

slide-14
SLIDE 14

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

14

Realistic Learning Setting

  • Train on first K% data and test in the last

data points

? ? ? ?

slide-15
SLIDE 15

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

15

RRN Model

  • Train two RNNs: one for all users and other

for all movies

– User RNN parameters are shared across all users; same for movies

User RNN Movie RNN

slide-16
SLIDE 16

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

16

RRN Process

  • Initialization: User and movie embeddings

are initialized

– Initialization can be one-hot

  • Embedding update
  • Prediction: To predict the rating a user gives

to a movie, the user’s embedding is multiplied with the movie’s embedding

  • Loss: User-movie rating score prediction

error is used to update the RNN parameters

slide-17
SLIDE 17

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

17

User RNN

  • User RNN takes a user’s (movie, rating)

sequence

– Each input: concatenation of movie embedding and one-hot vector of rating score – RNN initialization: special ‘new’ vector to indicate a new user

  • For the next user, the process is repeated,

starting from initialization

slide-18
SLIDE 18

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

18

Movie RNN

  • Movie RNN takes the movie’s (user, rating)

sequence

– Each input: concatenation of user embedding and one-hot vector of rating score – RNN initialization: special ‘new’ vector to indicate a new movie

  • For the next movie, the process is repeated,

starting from initialization

slide-19
SLIDE 19

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

19

Rating Prediction

  • What is the rating by a ui to mj at time t?
  • Take the user and movie embedding till time

t and output rating

  • Output function: MLP, Hadammard

product, etc.

)

slide-20
SLIDE 20

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

20

Model Training

  • Learn the model parameters 𝛴 such that the

predicted rating is close to the actual rating

  • R(𝛴) is a regularization term to avoid
  • verfitting
slide-21
SLIDE 21

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

21

Experiments

  • Three datasets, several baselines

– PMF: Salakhutdinov & Mnih NIPS ’07 – T-STD: Koren KDD ’09 – U-AR & I-AR: Sedhain et al. WWW ‘15

  • Metric = RMSE (Root Mean Square Error)

(RMSE)

slide-22
SLIDE 22

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

22

Temporal Effects

  • How well does the model capture the

temporal effects?

22

?

slide-23
SLIDE 23

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

23

Exogenous Effects

  • RRN automatically captures the exogenous

effects

Oscar & Golden globe

slide-24
SLIDE 24

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

24

System Effects

  • RRN automatically learns the system effects

Netflix changed the Likert scale

slide-25
SLIDE 25

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

25

Movie Age Effect

  • RRN automatically learns effects that we

typically capture via hand-crafted features

Movie age effects

slide-26
SLIDE 26

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

26

RRN Summary

Novel model Future prediction Accurate prediction Temporal dynamics

? ? ? ?

slide-27
SLIDE 27

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

27

Today’s Lecture

  • Introduction
  • Neural Collaborative Filtering
  • RRN
  • JODIE
slide-28
SLIDE 28

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

28

Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks

Srijan Kumar

Stanford University Georgia Institute of Technology

Jure Leskovec

Stanford University

Xikun Zhang

UIUC

Code and Data: https://snap.stanford.edu/jodie

slide-29
SLIDE 29

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

29

Temporal Interaction Networks

Time [KDD’19]

Flexible way to represent time-evolving relations

Users Items

Feature interaction user item time features

Represented as a sequence of interactions, sorted by time:

slide-30
SLIDE 30

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

30

Temporal Interaction Networks

[KDD’19] E-commerce Social media Finance Web Education IoT

Application domains Accounts Posts

…...

slide-31
SLIDE 31

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

31

Temporal Interaction Networks

[KDD’19] E-commerce Social media Finance Web

Students Courses

Education IoT

Application domains

…...

slide-32
SLIDE 32

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

32

Problem Setup

Given a temporal interaction network where generate an embedding trajectory of every user and an embedding trajectory of every item

[KDD’19] interaction user item time features

slide-33
SLIDE 33

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

33

Goal: Generate Dynamic Trajectory

Output: Dynamic trajectory in embedding space Input: Temporal interaction network

[KDD’19] 1 2 4 3 5 6

slide-34
SLIDE 34

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

34

Challenges

Challenges in modeling:

  • C1: How to learn inter-dependent user and item

embeddings?

  • C2: How to generate embedding for every point

in time?

Challenges in scalability:

  • C3: How to scalably train models on temporal

networks?

[KDD’19]

slide-35
SLIDE 35

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

35

Existing Methods

Deep recommender systems

  • Time-LSTM (IJCAI 2017)
  • Recurrent Recommender Networks (WSDM

2017)

  • Latent Cross (WSDM 2018)

Dynamic co-evolution

  • Deep Coevolve (DLRS, 2016)

Temporal network embedding

  • CTDNE (BigNet, 2018)

Our model: JODIE

[KDD’19]

C1

Co- influence

C2

Embed any time

C3

Train in batches

slide-36
SLIDE 36

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

36

Our Model: JODIE

JODIE: Joint Dynamic Interaction Embedding

  • Mutually-recursive recurrent neural network framework

[KDD’19]

Projection Operator

Project Component

User RNN Item RNN

Update Component

slide-37
SLIDE 37

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

37

JODIE: Update Component

[KDD’19]

User RNN Item RNN f =

Weight matrices

W are trained

  • All users share the User-RNN parameters.
  • All items share the Item-RNN parameters.
slide-38
SLIDE 38

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

38

JODIE: Project Component

How can we generate recommendations?

  • Rank items using distance in the embedding space

[KDD’19]

Projected embedding Projection operator

Time Δ

Projected embedding f = User RNN Item RNN

slide-39
SLIDE 39

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

39

Training Objective

  • Goal: Train model such that:

Embedding of the predicted item ~ Embedding of the real next item

  • Predicted item = Linear layer passed on top
  • f projected user embedding
slide-40
SLIDE 40

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

40

Summary: JODIE Formulation

Update embeddings:

[KDD’19]

Loss:

Predicted next item is close to the real item embedding Smoothness in evolving embeddings

Project user embedding: Predict next item:

slide-41
SLIDE 41

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

41

Challenges in Dynamic Trajectories

Challenges in learning:

  • C1: How to learn inter-dependent user and item

embeddings? Solution: Update component

  • C2: How to generate embedding for every point in

time? Solution: Project component

Challenges in scalability:

  • C3: How to scalably train models on temporal

networks?

[KDD’19]

slide-42
SLIDE 42

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

42

Standard Training Processes: N/A

Training must maintain temporal order

[KDD’19] (1) (2) (3) (4) . . . . . .

User 1 User 2 User 3

Split by user (or item): not allowed Sequential processing: not scalable

1 2 3 4 4 3 2 1 5 6

Batch 1

Temporal inconsistency

slide-43
SLIDE 43

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

43

T-batch: Temporal data batching algorithm

  • Main idea: create each batch as an

independent edge set

  • Create a sequence of batches

– Interactions in each batch are processed in parallel – Process the batches in sequence to maintain temporal ordering

[KDD’19]

T-batch: Batching for Scalability

slide-44
SLIDE 44

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

44

T-batch: Batching for Scalability

Batch 2 Batch 1 Batch 3

[KDD’19] 1 2 3 4 5 6 2 1 4 3 5 6

Iteratively select the maximal independent edge set.

slide-45
SLIDE 45

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

45

Challenges in Dynamic Trajectories

Challenges in learning:

  • C1: How to learn inter-dependent user and item

embeddings? Solution: Update component

  • C2: How to generate embedding for every point in

time? Solution: Project component

Challenges in scalability:

  • C3: How to scalably train models on temporal

networks? Solution: T-batch Algorithm

[KDD’19]

slide-46
SLIDE 46

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

46

Experiments: Prediction Tasks

  • Temporal Link Prediction:

– Which item i ∈ 𝐽 will user u interact with at time t?

  • Temporal Node Classification:

– Does a user u become anomalous after an interaction?

  • Settings:

– Temporal Splits: 80%, 10%, 10% – Metrics: Mean reciprocal rank, Recall@10, AUROC

[KDD’19]

Code and Data: https://snap.stanford.edu/jodie

slide-47
SLIDE 47

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

47

Datasets

[KDD’19]

Dataset Users Items Interactions Temporal Anomalies Reddit 10,000 984 672,447 366 Wikipedia 8,227 1,000 157,474 217 LastFM 980 1,000 1,293,103

  • MOOC

7,047 97 411,749 4,066

NEW! NEW!

Code and Data: https://snap.stanford.edu/jodie

slide-48
SLIDE 48

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

48

Experiment 1: Link Prediction

JODIE outperforms baselines by > 20%

Mean Reciprocal Rank

0.0 1.0 Latent Cross 0.42 0.18 Time- LSTM 0.60 RRN 0.73 0.39 0.17 CTDNE Deep Coevolve JODIE 0.2 0.4 0.6 0.8

[KDD’19]

slide-49
SLIDE 49

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

49

Experiment 2: Node Classification

JODIE outperforms all baselines by >12%

AUROC

0.5 1.0 Latent Cross 0.63 0.58 Time- LSTM 0.65 RRN 0.73 0.65 0.64 CTDNE Deep Coevolve JODIE 0.6 0.7 0.8 0.9

[KDD’19]

slide-50
SLIDE 50

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

50

Experiment 3: T-batch Speed-up

T-batch leads to 8.5x speed-up in training

5.1 minutes 44 minutes

JODIE without T-batch JODIE with T-batch

Running Time

50 10 20 30 40

8.5x speed-up

[KDD’19]

slide-51
SLIDE 51

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

51

Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks

Srijan Kumar, Xikun Zhang, Jure Leskovec

Code and Data: https://snap.stanford.edu/jodie

JODIE generates and projects embedding trajectories

  • JODIE: a mutually-recursive RNN framework
  • T-batch: 8.5x training speed-up
  • Efficient in temporal link prediction and node classification
  • Extendible to > 2 entity types
slide-52
SLIDE 52

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

52

Today’s Lecture

  • Introduction
  • Neural Collaborative Filtering
  • RRN
  • JODIE
slide-53
SLIDE 53

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

53

Impact of Recommender Systems

  • Boosting the long tail
  • Pushing users to fringe content
  • Can be biased towards highly-pop
slide-54
SLIDE 54

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

54

Boosting the Long Tail

  • Recommender Systems