Web Mining and Recommender Systems Advanced Recommender Systems - - PowerPoint PPT Presentation

web mining and recommender systems
SMART_READER_LITE
LIVE PREVIEW

Web Mining and Recommender Systems Advanced Recommender Systems - - PowerPoint PPT Presentation

Web Mining and Recommender Systems Advanced Recommender Systems This week Methodological papers Bayesian Personalized Ranking Factorizing Personalized Markov Chains Personalized Ranking Metric Embedding This week Goals: This week


slide-1
SLIDE 1

Web Mining and Recommender Systems

Advanced Recommender Systems

slide-2
SLIDE 2

This week

Methodological papers

  • Bayesian Personalized Ranking
  • Factorizing Personalized Markov Chains
  • Personalized Ranking Metric Embedding
slide-3
SLIDE 3

This week

Goals:

slide-4
SLIDE 4

This week

Application papers

  • Recommending Product Sizes to

Customers

  • Playlist Prediction via Metric Embedding
  • Efficient Natural Language

Response Suggestion for Smart Reply

slide-5
SLIDE 5

This week

We (hopefully?) know enough by now to…

  • Read academic papers on Recommender

Systems

  • Understand most of the models and

evaluations used See also – CSE291

slide-6
SLIDE 6

Bayesian Personalized Ranking

slide-7
SLIDE 7

Bayesian Personalized Ranking Goal: Estimate a personalized ranking function for each user

slide-8
SLIDE 8

Bayesian Personalized Ranking

Why? Compare to “traditional” approach of replacing “missing values” by 0: But! “0”s aren’t necessarily negative!

slide-9
SLIDE 9

Bayesian Personalized Ranking

Why? Compare to “traditional” approach of replacing “missing values” by 0: This suggests a possible solution based on ranking

slide-10
SLIDE 10

Bayesian Personalized Ranking

Defn: AUC (for a user u)

scoring function that compares an item i to an item j for a user u

The AUC essentially counts how many times the model correctly identifies that u prefers the item they bought (positive feedback) over the item they did not

( )

slide-11
SLIDE 11

Bayesian Personalized Ranking

Defn: AUC (for a user u) AUC = 1: We always guess correctly among two potential items i and j AUC = 0.5: We guess no better than random

slide-12
SLIDE 12

Bayesian Personalized Ranking

Defn: AUC = Area Under Precision Recall Curve

slide-13
SLIDE 13

Bayesian Personalized Ranking

Summary: Goal is to count how many times we identified i as being more preferable than j for a user u

slide-14
SLIDE 14

Bayesian Personalized Ranking

Summary: Goal is to count how many times we identified i as being more preferable than j for a user u

slide-15
SLIDE 15

Bayesian Personalized Ranking

Idea: Replace the counting function by a smooth function

is any function that compares the compatibility of i and j for a user u e.g. could be based on matrix factorization:

slide-16
SLIDE 16

Bayesian Personalized Ranking

Idea: Replace the counting function by a smooth function

slide-17
SLIDE 17

Bayesian Personalized Ranking

Idea: Replace the counting function by a smooth function

slide-18
SLIDE 18

Bayesian Personalized Ranking

Experiments:

  • RossMann (online drug store)
  • Netflix (treated as a binary problem)
slide-19
SLIDE 19

Bayesian Personalized Ranking

Experiments:

slide-20
SLIDE 20

Bayesian Personalized Ranking

Morals of the story:

  • Given a “one-class” prediction task (like purchase

prediction) we might want to optimize a ranking function rather than trying to factorize a matrix directly

  • The AUC is one such measure that counts among a

users u, items they consumed i, and items they did not consume, j, how often we correctly guessed that i was preferred by u

  • We can optimize this approximately by maximizing

where

slide-21
SLIDE 21

Factorizing Personalized Markov Chains for Next-Basket Recommendation

slide-22
SLIDE 22

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Goal: build temporal models just by looking at the item the user purchased previously

(or )

slide-23
SLIDE 23

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Assumption: all of the information contained by temporal models is captured by the previous action this is what’s known as a first-order Markov property

slide-24
SLIDE 24

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Is this assumption realistic?

slide-25
SLIDE 25

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Data setup: Rossmann basket data

slide-26
SLIDE 26

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Prediction task:

slide-27
SLIDE 27

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Could we try and compute such probabilities just by counting? Seems okay, as long as the item vocabulary is small (I^2 possible item/item combinations to count) But it’s not personalized

slide-28
SLIDE 28

Factorizing Personalized Markov Chains for Next-Basket Recommendation

What if we try to personalize? Now we would have U*I^2 counts to compare Clearly not feasible, so we need to try and estimate/model this quantity (e.g. by matrix factorization)

slide-29
SLIDE 29

Factorizing Personalized Markov Chains for Next-Basket Recommendation

What if we try to personalize?

slide-30
SLIDE 30

Factorizing Personalized Markov Chains for Next-Basket Recommendation

What if we try to personalize?

slide-31
SLIDE 31

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Prediction task:

slide-32
SLIDE 32

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Prediction task:

slide-33
SLIDE 33

Factorizing Personalized Markov Chains for Next-Basket Recommendation

F@5

FMC: not personalized MF: personalized, but not sequentially-aware

slide-34
SLIDE 34

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Morals of the story:

  • Can improve performance by modeling third
  • rder interactions between the user, the item, and

the previous item

  • This is simpler than temporal models – but makes a

big assumption

  • Given the blowup in the interaction space, this can

be handled by tensor decomposition techniques

slide-35
SLIDE 35

Personalized Ranking Metric Embedding for Next New POI Recommendation

slide-36
SLIDE 36

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Goal: Can we build better sequential recommendation models by using the idea of metric embeddings

vs.

slide-37
SLIDE 37

Personalized Ranking Metric Embedding for Next New POI Recommendation

Why would we expect this to work (or not)?

slide-38
SLIDE 38

Personalized Ranking Metric Embedding for Next New POI Recommendation

Otherwise, goal is the same as the previous paper:

slide-39
SLIDE 39

Personalized Ranking Metric Embedding for Next New POI Recommendation

Data

slide-40
SLIDE 40

Personalized Ranking Metric Embedding for Next New POI Recommendation

Qualitative analysis

slide-41
SLIDE 41

Personalized Ranking Metric Embedding for Next New POI Recommendation

Qualitative analysis

slide-42
SLIDE 42

Personalized Ranking Metric Embedding for Next New POI Recommendation

Basic model (not personalized)

slide-43
SLIDE 43

Personalized Ranking Metric Embedding for Next New POI Recommendation

Basic model (not personalized)

slide-44
SLIDE 44

Personalized Ranking Metric Embedding for Next New POI Recommendation

Personalized version

slide-45
SLIDE 45

Personalized Ranking Metric Embedding for Next New POI Recommendation

Personalized version

slide-46
SLIDE 46

Personalized Ranking Metric Embedding for Next New POI Recommendation

Learning

slide-47
SLIDE 47

Personalized Ranking Metric Embedding for Next New POI Recommendation

Results

slide-48
SLIDE 48

Personalized Ranking Metric Embedding for Next New POI Recommendation

Morals of the story:

  • In some applications, metric embeddings might

be better than inner products

  • Examples could include geographical data, but also
  • thers (e.g. playlists?)
slide-49
SLIDE 49

Overview

Morals of the story:

  • Today we looked at two main ideas that extend the

recommender systems we saw in class:

  • 1. Sequential Recommendation: Most of the

dynamics due to time can be captured purely by knowing the sequence of items

  • 2. Metric Recommendation: In some settings, using

inner products may not be the correct assumption

slide-50
SLIDE 50

Web Mining and Recommender Systems

Real-world applications of recommender systems

slide-51
SLIDE 51

Recommending product sizes to customers

slide-52
SLIDE 52

Recommending product sizes to customers Goal: Build a recommender system that predicts whether an item will “fit”:

slide-53
SLIDE 53

Recommending product sizes to customers Challenges:

  • Data sparsity: people have very few

purchases from which to estimate size

  • Cold-start: How to handle new

customers and products with no past purchases?

  • Multiple personas: Several customers

may use the same account

slide-54
SLIDE 54

Recommending product sizes to customers Data:

  • Shoe transactions from Amazon.com
  • For each shoe j, we have a reported size

c_j (from the manufacturer), but this may not be correct!

  • Need to estimate the customer’s size (s_i),

as well as the product’s true size (t_j)

slide-55
SLIDE 55

Recommending product sizes to customers Loss function:

slide-56
SLIDE 56

Recommending product sizes to customers Loss function:

slide-57
SLIDE 57

Recommending product sizes to customers Loss function:

slide-58
SLIDE 58

Recommending product sizes to customers

slide-59
SLIDE 59

Recommending product sizes to customers Loss function:

slide-60
SLIDE 60

Recommending product sizes to customers Model fitting:

slide-61
SLIDE 61

Recommending product sizes to customers Extensions:

  • Multi-dimensional sizes
  • Customer and product features
  • User personas
slide-62
SLIDE 62

Recommending product sizes to customers Experiments:

slide-63
SLIDE 63

Recommending product sizes to customers Experiments: Online A/B test

slide-64
SLIDE 64

Recommending product sizes to customers

Morals of the story:

  • Very simple model that actually works well in

production

  • Only a single parameter per user and per item!
slide-65
SLIDE 65

Playlist prediction via Metric Embedding

slide-66
SLIDE 66

Playlist prediction via Metric Embedding Goal: Build a recommender system that recommends sequences of songs Idea: Might also use a metric embedding (consecutive songs should be “nearby” in some space)

slide-67
SLIDE 67

Playlist prediction via Metric Embedding Basic model:

(compare with metric model from last lecture)

slide-68
SLIDE 68

Playlist prediction via Metric Embedding Basic model (“single point”):

slide-69
SLIDE 69

Playlist prediction via Metric Embedding “Dual-point” model

slide-70
SLIDE 70

Playlist prediction via Metric Embedding Extensions:

  • Popularity biases
slide-71
SLIDE 71

Playlist prediction via Metric Embedding Extensions:

  • Personalization
slide-72
SLIDE 72

Playlist prediction via Metric Embedding Extensions:

  • Semantic Tags
slide-73
SLIDE 73

Playlist prediction via Metric Embedding Extensions:

  • Observable Features
slide-74
SLIDE 74

Playlist prediction via Metric Embedding Experiments:

Yes.com playlists

  • Dec 2010 – May 2011

“Small” dataset:

  • 3,168 songs
  • 134,431 + 1,191,279 transitions

“Large” dataset

  • 9,775 songs
  • 172,510 transitions + 1,602,079 transitions
slide-75
SLIDE 75

Playlist prediction via Metric Embedding Experiments:

slide-76
SLIDE 76

Playlist prediction via Metric Embedding Experiments:

Small Big

slide-77
SLIDE 77

Playlist prediction via Metric Embedding

Morals of the story:

  • Metric assumption works well in settings other

than “geographical” data!

  • However, they require some modifications in order

to work well (e.g. “start points” and “end points”)

  • Effective combination of latent + observed

features, as well as metric + inner-product models

slide-78
SLIDE 78

Efficient Natural Language Response Suggestion for Smart Reply

slide-79
SLIDE 79

Efficient Natural Language Response Suggestion for Smart Reply Goal: Automatically suggest common responses to e-mails

slide-80
SLIDE 80

Efficient Natural Language Response Suggestion for Smart Reply Basic setup

slide-81
SLIDE 81

Efficient Natural Language Response Suggestion for Smart Reply Previous solution (KDD 2016)

  • Based on a seq2seq method
slide-82
SLIDE 82

Efficient Natural Language Response Suggestion for Smart Reply Idea: Replace this (complex) solution with a simple multiclass classification-based solution

slide-83
SLIDE 83

Efficient Natural Language Response Suggestion for Smart Reply Idea: Replace this (complex) solution with a simple multiclass classification-based solution

slide-84
SLIDE 84

Efficient Natural Language Response Suggestion for Smart Reply Model: S(x,y)

slide-85
SLIDE 85

Efficient Natural Language Response Suggestion for Smart Reply Model: Architecture v1

slide-86
SLIDE 86

Efficient Natural Language Response Suggestion for Smart Reply Model: Architecture v2

slide-87
SLIDE 87

Efficient Natural Language Response Suggestion for Smart Reply Model: Extensions

slide-88
SLIDE 88

Efficient Natural Language Response Suggestion for Smart Reply Model: Extensions

slide-89
SLIDE 89

Efficient Natural Language Response Suggestion for Smart Reply Experiments: (offline)

slide-90
SLIDE 90

Efficient Natural Language Response Suggestion for Smart Reply Experiments: (online)

slide-91
SLIDE 91

Efficient Natural Language Response Suggestion for Smart Reply

Morals:

  • Even a seemingly complex problem like natural-

language response generation can be cast as a multiclass classification problem!

  • Even a simple bag-of-words model proved to be

sufficient, no need to handle “grammar” etc.

  • Also, no personalization (though to what extent

would this be possible with the data available?)

slide-92
SLIDE 92

Overview

Morals:

  • State-of-the-art recommender systems (whether

from academia or industry) are not so far from what we learned in class

  • All of them depended on some kind of maximum-

likelihood expression, along with gradient ascent/descent!