Recommendation Systems: Part II Prof. Srijan Kumar - - PowerPoint PPT Presentation

recommendation systems part ii
SMART_READER_LITE
LIVE PREVIEW

Recommendation Systems: Part II Prof. Srijan Kumar - - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Recommendation Systems: Part II Prof. Srijan Kumar http://cc.gatech.edu/~srijan 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Announcements Project:


slide-1
SLIDE 1

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

1

CSE 6240: Web Search and Text Mining. Spring 2020

Recommendation Systems: Part II

  • Prof. Srijan Kumar

http://cc.gatech.edu/~srijan

slide-2
SLIDE 2

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

2

Announcements

  • Project:

– Final report rubric: released – Final presentation: details forthcoming

slide-3
SLIDE 3

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

3

Recommendation Systems

  • Content-based
  • Collaborative Filtering
  • Latent Factor Models
  • Case Study: Netflix Challenge
  • Deep Recommender Systems

Slide reference: Mining Massive Dataset http://mmds.org/

slide-4
SLIDE 4

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

4

Latent Factor Models

  • These models learn latent factors to

represent users and items from the rating matrix

– Latent factors are not directly observable – These are derived from the data

  • Recall: Network embeddings
  • Methods:

– Singular value decomposition (SVD) – Principal Component Analysis (PCA) – Eigendecompositon

slide-5
SLIDE 5

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

5

Latent Factors: Example

  • Embedding axes are a type of latent factors
  • In a user-movie rating matrix:
  • Movie latent factors can represent axes:

– Comedy vs drama – Degree of action – Appropriateness to children

  • User latent factors will measure a user’s

affinity towards corresponding movie factors

slide-6
SLIDE 6

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Geared towards females Geared towards males Serious Funny

Latent Factors: Example

6

The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus Dumb and Dumber Ocean’s 11 Sense and Sensibility

Factor 1 Factor 2

slide-7
SLIDE 7

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

7

SVD

  • SVD: SVD decomposes an input matrix into

multiple factor matrices

– A = U S VT – Where, – A: Input data matrix – U: Left singular vecs – V: Right singular vecs – S: Singular values

A

m n

S

m n

VT

»

U

slide-8
SLIDE 8

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

8

SVD

  • SVD gives minimum reconstruction error

(Sum of Squared Errors): min

$,&,' ( 𝐵*+ − 𝑉Σ𝑊0 *+ 1

  • *+∈4
  • SSE and RMSE are monotonically related:

– 𝑆𝑁𝑇𝐹 =

; <

𝑇𝑇𝐹

  • è SVD is minimizing RMSE
  • Complication: The sum in SVD error term is
  • ver all entries. But our R has missing

entries.

– Solution: no-rating in interpreted as zero-rating.

slide-9
SLIDE 9

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

9

SVD on Rating Matrix

  • “SVD” on rating data: R ≈ Q · PT
  • Each row of Q represents an item
  • Each column of P represents a user

4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1 .2

  • .4

.1 .5 .6

  • .5

.5 .3

  • .2

.3 2.1 1.1

  • 2

2.1

  • .7

.3 .7

  • 1
  • .9

2.4 1.4 .3

  • .4

.8

  • .5
  • 2

.5 .3

  • .2

1.1 1.3

  • .1

1.2

  • .7

2.9 1.4

  • 1

.3 1.4 .5 .7

  • .8

.1

  • .6

.7 .8 .4

  • .3

.9 2.4 1.7 .6

  • .4

2.1

users items

PT Q

items users

R

factors factors

slide-10
SLIDE 10

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

10

Ratings as Products of Factors

  • How to estimate the missing rating of

user x for item i?

4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1

items

.2

  • .4

.1 .5 .6

  • .5

.5 .3

  • .2

.3 2.1 1.1

  • 2

2.1

  • .7

.3 .7

  • 1
  • .9

2.4 1.4 .3

  • .4

.8

  • .5
  • 2

.5 .3

  • .2

1.1 1.3

  • .1

1.2

  • .7

2.9 1.4

  • 1

.3 1.4 .5 .7

  • .8

.1

  • .6

.7 .8 .4

  • .3

.9 2.4 1.7 .6

  • .4

2.1

items users users

?

PT

𝒔 >𝒚𝒋 = 𝒓𝒋 ⋅ 𝒒𝒚 = ( 𝒓𝒋𝒈 ⋅ 𝒒𝒚𝒈

  • 𝒈

qi = row i of Q px = column x of PT

factors

Q

factors

slide-11
SLIDE 11

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

11

Ratings as Products of Factors

  • How to estimate the missing rating of

user x for item i?

4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1

items

.2

  • .4

.1 .5 .6

  • .5

.5 .3

  • .2

.3 2.1 1.1

  • 2

2.1

  • .7

.3 .7

  • 1
  • .9

2.4 1.4 .3

  • .4

.8

  • .5
  • 2

.5 .3

  • .2

1.1 1.3

  • .1

1.2

  • .7

2.9 1.4

  • 1

.3 1.4 .5 .7

  • .8

.1

  • .6

.7 .8 .4

  • .3

.9 2.4 1.7 .6

  • .4

2.1

items users users

?

PT

factors

Q

factors

𝒔 >𝒚𝒋 = 𝒓𝒋 ⋅ 𝒒𝒚 = ( 𝒓𝒋𝒈 ⋅ 𝒒𝒚𝒈

  • 𝒈

qi = row i of Q px = column x of PT

slide-12
SLIDE 12

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

12

Ratings as Products of Factors

  • How to estimate the missing rating of

user x for item i?

4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1

items

.2

  • .4

.1 .5 .6

  • .5

.5 .3

  • .2

.3 2.1 1.1

  • 2

2.1

  • .7

.3 .7

  • 1
  • .9

2.4 1.4 .3

  • .4

.8

  • .5
  • 2

.5 .3

  • .2

1.1 1.3

  • .1

1.2

  • .7

2.9 1.4

  • 1

.3 1.4 .5 .7

  • .8

.1

  • .6

.7 .8 .4

  • .3

.9 2.4 1.7 .6

  • .4

2.1

items users users

?

Q PT

2.4 factors factors

𝒔 >𝒚𝒋 = 𝒓𝒋 ⋅ 𝒒𝒚 = ( 𝒓𝒋𝒈 ⋅ 𝒒𝒚𝒈

  • 𝒈

qi = row i of Q px = column x of PT

slide-13
SLIDE 13

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Geared towards females Geared towards males Serious Funny

Latent Factor Models: Example

13

The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus Dumb and Dumber Ocean’s 11 Sense and Sensibility

Factor 1 Factor 2

Movies plotted in two dimensions. Dimensions have meaning.

slide-14
SLIDE 14

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Geared towards females Geared towards males Serious Funny

Latent Factor Models

14

The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus Dumb and Dumber Ocean’s 11 Sense and Sensibility

Factor 1 Factor 2

Users fall in the same space, showing their preferences.

slide-15
SLIDE 15

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

15

SVD: Problems

  • SVD minimizes SSE for training data

– Want large k (# of factors) to capture all the signals – But, error on test data begins to rise for k > 2

  • This is a classical example of overfitting:

– With too much freedom (too many free parameters) the model starts fitting noise – Model fits too well the training data and thus not generalizing well to unseen test data

slide-16
SLIDE 16

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

16

Preventing Overfitting

  • To solve overfitting we introduce

regularization:

– Allow rich model where there are sufficient data – Shrink aggressively where data are scarce

ú û ù ê ë é + +

  • å

å å

i i x x training x i xi Q P

q p p q r

2 2 2 1 2 ,

) (

min

l l

l1, l2 … user set regularization parameters

“error” “length”

Note: We do not care about the “raw” value of the objective function, but we care in P,Q that achieve the minimum of the objective

slide-17
SLIDE 17

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Geared towards females Geared towards males serious funny

The Effect of Regularization

17

The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s 11 Sense and Sensibility

Factor 1 Factor 2

The Princess Diaries

slide-18
SLIDE 18

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

18

Modeling Biases and Interactions

¡ μ = overall mean rating ¡ bx = bias of user x ¡ bi = bias of movie i

user-movie interaction movie bias user bias User-Movie interaction

¡

Characterizes the matching between users and movies

¡

Attracts most research in the field

¡

Benefits from algorithmic and mathematical innovations

Baseline predictor § Separates users and movies § Benefits from insights into user’s behavior § Among the main practical contributions of the competition

slide-19
SLIDE 19

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

19

Baseline Predictor

  • We have expectations on the rating by

user x of movie i, even without estimating x’s attitude towards movies like i

– Rating scale of user x – Values of other ratings user gave recently (day-specific mood, anchoring, multi-user accounts) – (Recent) popularity of movie i – Selection bias; related to number of ratings user gave on the same day (“frequency”)

slide-20
SLIDE 20

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

20

Putting It All Together

  • Example:

– Mean rating: µ µ = 3.7 – You are a critical reviewer: your ratings are 1 star lower than the mean: bx = -1 – Star Wars gets a mean rating of 0.5 higher than average movie: bi = + 0.5 – Predicted rating for you on Star Wars: = 3.7 - 1 + 0.5 = 3.2

Overall mean rating Bias for user x Bias for movie i

𝑠

F* = 𝜈 + 𝑐F + 𝑐* + 𝑟*⋅ 𝑞F

User-Movie interaction

slide-21
SLIDE 21

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

21

Fitting the New Model

  • Solve:
  • Stochastic gradient decent to find

parameters

– Note: Both biases bx, bi as well as interactions qi,

px are treated as parameters (we estimate them)

regularization goodness of fit

l is selected via grid-search on a validation set

( )

÷ ø ö ç è æ + + + + + + +

  • å

å å å å

Î i i x x x x i i R i x x i i x xi P Q

b b p q p q b b r

2 4 2 3 2 2 2 1 2 ) , ( ,

) (

min

l l l l µ

slide-22
SLIDE 22

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

22

Recommendation Systems

  • Content-based
  • Collaborative Filtering
  • Latent Factor Models
  • Case Study: Netflix Challenge
  • Deep Recommender Systems
slide-23
SLIDE 23

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Case Study: Netflix Prize

23

slide-24
SLIDE 24

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Grand Prize: 0.8563 Netflix: 0.9514 Movie average: 1.0533 User average: 1.0651 Global average: 1.1296

Case Study: The Netflix Prize

Basic Collaborative filtering: 0.94 Latent factors: 0.90 Latent factors + Biases: 0.89 Collaborative filtering++: 0.91

24

Competition

  • Task: Reduce RMSE
  • 2,700+ teams
  • $1 million prize for 10%

improvement

  • n Netflix
slide-25
SLIDE 25

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

25

Case Study: The Netflix Prize

  • Training data

– 100 million ratings, 480,000 users, 17,770 movies – 6 years of data: 2000-2005

  • Test data

– Last few ratings of each user (2.8 million) – Evaluation criterion: Root Mean Square Error (RMSE) =

; L

∑ 𝑠̂F* − 𝑠F* 1

  • (*,F)∈L
  • – Netflix’s system RMSE: 0.9514
slide-26
SLIDE 26

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

26

BellKor Recommender System

  • The winner of the Netflix Challenge!
  • Multi-scale modeling of the data:

Combine top level, “regional” modeling of the data, with a refined, local view:

– Global:

  • Overall deviations of users/movies

– Factorization:

  • Addressing “regional” effects

– Collaborative filtering:

  • Extract local patterns

Global effects Factorization Collaborative filtering

slide-27
SLIDE 27

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

27

Modeling Local & Global Effects

  • Global:

– Mean movie rating: 3.7 stars – The Sixth Sense is 0.5 stars above avg. – Joe rates 0.2 stars below avg. Þ Baseline estimation: Joe will rate The Sixth Sense 4 stars

  • Local neighborhood (CF/NN):

– Joe didn’t like related movie Signs – Þ Final estimate: Joe will rate The Sixth Sense 3.8 stars

slide-28
SLIDE 28

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

28

Interpolation Weights

  • So far: 𝑠F*

Q = 𝑐F* + ∑ 𝑥*+ 𝑠F+ − 𝑐F+

  • +∈S(*;F)

– Weights wij derived based

  • n their role; no use of an

arbitrary similarity measure (wij ¹ sij) – Explicitly account for interrelationships among the neighboring movies

  • Next: Latent factor model

– Extract “regional” correlations

Global effects

Factorization

CF/NN

slide-29
SLIDE 29

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

29

Temporal Biases Of Users

  • Sudden rise in the

average movie rating (early 2004)

– Improvements in Netflix – GUI improvements – Meaning of rating changed

  • Movie age

– Users prefer new movies

without any reasons

– Older movies are just

inherently better than newer ones

  • Y. Koren, Collaborative filtering with temporal dynamics,

KDD ’09

slide-30
SLIDE 30

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

30

Temporal Biases & Factors

  • Original model:

rxi = µ +bx + bi + qi ·px

  • Add time dependence to biases:

rxi = µ +bx(t)+ bi(t) +qi · px

– Make parameters bx and bi to depend on time – (1) Parameterize time-dependence by linear trends

(2) Each bin corresponds to 10 consecutive weeks

  • Add temporal dependence to factors

– px(t) = user preference vector on day t

  • Y. Koren, Collaborative filtering

with temporal dynamics, KDD ’09

slide-31
SLIDE 31

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

31

Adding Temporal Effects

0.875 0.88 0.885 0.89 0.895 0.9 0.905 0.91 0.915 0.92 1 10 100 1000 10000 RMSE Millions of parameters CF (no time bias) Basic Latent Factors CF (time bias) Latent Factors w/ Biases + Linear time factors + Per-day user biases + CF

slide-32
SLIDE 32

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Grand Prize: 0.8563 Netflix: 0.9514 Movie average: 1.0533 User average: 1.0651 Global average: 1.1296

Case Study: The Netflix Prize

Basic Collaborative filtering: 0.94 Latent factors: 0.90 Latent factors + Biases: 0.89 Collaborative filtering++: 0.91

32

Latent factors+Biases+Time: 0.876

  • New update: Added time
  • But still no prize!
  • What to do next?
slide-33
SLIDE 33

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

33 33

slide-34
SLIDE 34

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Standing on June 26th 2009

34

June 26th submission triggers 30-day “last call”

slide-35
SLIDE 35

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

35

The Last 30 Days

  • Ensemble team formed

– Group of other teams on leaderboard forms a new team – Relies on combining their models – Quickly also get a qualifying score over 10%

  • BellKor

– Continue to get small improvements in their scores – Realize that they are in direct competition with Ensemble

  • Strategy

– Both teams carefully monitoring the leaderboard – Only sure way to check for improvement is to submit a

set of predictions

  • This alerts the other team of your latest score
slide-36
SLIDE 36

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

36

24 Hours from the Deadline

  • Submissions limited to 1 a day

– Only 1 final submission could be made in the last 24h

  • 24 hours before deadline…

– BellKor team member in Austria notices that Ensemble

posts a score that is slightly better than BellKor’s

  • Frantic last 24 hours for both teams

– Much computer time on final optimization – Carefully calibrated to end about an hour before deadline

  • Final submissions

– BellKor submits a little early (on purpose), 40 mins

before deadline

– Ensemble submits their final entry 20 mins later – ….and everyone waits….

slide-37
SLIDE 37

37

slide-38
SLIDE 38

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

$1M Awarded Sept 21st 2009

38

slide-39
SLIDE 39

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

39

Recommendation Systems

  • Content-based
  • Collaborative Filtering
  • Latent Factor Models
  • Case Study: Netflix Challenge
  • Deep Recommender Systems
slide-40
SLIDE 40

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

40

Deep Recommender Systems

  • How can deep learning advance

recommendation systems?

  • Simple way for content-based models: Use

CNNs, LSTMs for generate image and text features of items

slide-41
SLIDE 41

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

41

Deep Recommender Systems

  • But how can DL be used for tasks and

methods at the core of recommendation systems?

– For collaborative filtering? – For latent factor models? – For temporal dynamics? – Some new techniques?

slide-42
SLIDE 42

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

42

Why Deep Learning Techniques

Pros:

  • Capture non-linearity well
  • Non-manual representation learning
  • Efficient sequence modeling
  • Somewhat flexible and easy to retrain

Cons:

  • Lack of interpretability
  • Large data requirements
  • Extensive hyper-parameter tuning
slide-43
SLIDE 43

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

43

Applicable DL Techniques

Deep Learning methods:

  • MLPs and AutoEncoders
  • CNNs
  • RNNs
  • Adversarial Networks
  • Attention models
  • Deep reinforcement learning

How to uses these methods to improve recommender systems?

slide-44
SLIDE 44

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

44

Several Methods

  • Neural Collaborative Filtering
  • Recurrent Recommender Systems
  • LatentCross
  • Dynamic User Model: JODIE
slide-45
SLIDE 45

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

45

Neural Collaborative Filtering

  • Neural extensions of traditional

recommender system

  • Input: rating matrix, user profile and item

features (optional)

– If user/item features are unavailable, we can use

  • ne-hot vectors
  • Output: User and item embeddings
  • Traditional matrix factorization is a special

case of NCF

  • Reference: Neural Collaborative Filtering,

He et al., WWW 2017

slide-46
SLIDE 46

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

46

NCF Setup

  • User feature vector:
  • Item feature vector:
  • User embedding matrix: U
  • Item embedding matrix: I
  • Neural network: f
  • Neural network parameters: 𝛪
  • Predicted rating:
slide-47
SLIDE 47

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

47

NCF Model Architecture

  • Multiple layers of

fully connected layers form the Neural CF layer.

  • Output is a rating

score

  • Real rating score

is rui

slide-48
SLIDE 48

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

48

NCF model: Loss function

  • Train on the difference between predicted

rating and the real rating

  • Use negative sampling to reduce the

negative data points

  • Loss = cross-entropy loss