CSE 258 Web Mining and Recommender Systems Advanced Recommender - - PowerPoint PPT Presentation

cse 258
SMART_READER_LITE
LIVE PREVIEW

CSE 258 Web Mining and Recommender Systems Advanced Recommender - - PowerPoint PPT Presentation

CSE 258 Web Mining and Recommender Systems Advanced Recommender Systems This week Methodological papers Bayesian Personalized Ranking Factorizing Personalized Markov Chains Personalized Ranking Metric Embedding Translation-based


slide-1
SLIDE 1

CSE 258

Web Mining and Recommender Systems

Advanced Recommender Systems

slide-2
SLIDE 2

This week

Methodological papers

  • Bayesian Personalized Ranking
  • Factorizing Personalized Markov Chains
  • Personalized Ranking Metric Embedding
  • Translation-based Recommendation
slide-3
SLIDE 3

This week

Goals:

slide-4
SLIDE 4

This week

Application papers (Wednesday)

  • Recommending Product Sizes to Customers
  • Playlist prediction via Metric Embedding
  • Efficient Natural Language Response Suggestion for

Smart Reply

  • Personalized Itinerary Recommendation with Queuing

Time Awareness

  • Learning Visual Clothing Style with Heterogeneous

Dyadic Co-occurrences

slide-5
SLIDE 5

This week

We (hopefully?) know enough by now to…

  • Read academic papers on Recommender

Systems

  • Understand most of the models and

evaluations used See also – CSE291

slide-6
SLIDE 6

Bayesian Personalized Ranking

slide-7
SLIDE 7

Bayesian Personalized Ranking Goal: Estimate a personalized ranking function for each user

slide-8
SLIDE 8

Bayesian Personalized Ranking

Why? Compare to “traditional” approach of replacing “missing values” by 0: But! “0”s aren’t necessarily negative!

slide-9
SLIDE 9

Bayesian Personalized Ranking

Why? Compare to “traditional” approach of replacing “missing values” by 0: This suggests a possible solution based on ranking

slide-10
SLIDE 10

Bayesian Personalized Ranking

Defn: AUC (for a user u)

scoring function that compares an item i to an item j for a user u

The AUC essentially counts how many times the model correctly identifies that u prefers the item they bought (positive feedback) over the item they did not

( )

slide-11
SLIDE 11

Bayesian Personalized Ranking

Defn: AUC (for a user u) AUC = 1: We always guess correctly among two potential items i and j AUC = 0.5: We guess no better than random

slide-12
SLIDE 12

Bayesian Personalized Ranking

Defn: AUC = Area Under Precision Recall Curve

slide-13
SLIDE 13

Bayesian Personalized Ranking

Summary: Goal is to count how many times we identified i as being more preferable than j for a user u

slide-14
SLIDE 14

Bayesian Personalized Ranking

Summary: Goal is to count how many times we identified i as being more preferable than j for a user u

slide-15
SLIDE 15

Bayesian Personalized Ranking

Idea: Replace the counting function by a smooth function

is any function that compares the compatibility of i and j for a user u e.g. could be based on matrix factorization:

slide-16
SLIDE 16

Bayesian Personalized Ranking

Idea: Replace the counting function by a smooth function

slide-17
SLIDE 17

Bayesian Personalized Ranking

Idea: Replace the counting function by a smooth function

slide-18
SLIDE 18

Bayesian Personalized Ranking

Experiments:

  • RossMann (online drug store)
  • Netflix (treated as a binary problem)
slide-19
SLIDE 19

Bayesian Personalized Ranking

Experiments:

slide-20
SLIDE 20

Bayesian Personalized Ranking

Morals of the story:

  • Given a “one-class” prediction task (like purchase

prediction) we might want to optimize a ranking function rather than trying to factorize a matrix directly

  • The AUC is one such measure that counts among a

users u, items they consumed i, and items they did not consume, j, how often we correctly guessed that i was preferred by u

  • We can optimize this approximately by maximizing

where

slide-21
SLIDE 21

Factorizing Personalized Markov Chains for Next-Basket Recommendation

slide-22
SLIDE 22

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Goal: build temporal models just by looking at the item the user purchased previously

(or )

slide-23
SLIDE 23

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Assumption: all of the information contained by temporal models is captured by the previous action this is what’s known as a first-order Markov property

slide-24
SLIDE 24

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Is this assumption realistic?

slide-25
SLIDE 25

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Data setup: Rossmann basket data

slide-26
SLIDE 26

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Prediction task:

slide-27
SLIDE 27

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Could we try and compute such probabilities just by counting? Seems okay, as long as the item vocabulary is small (I^2 possible item/item combinations to count) But it’s not personalized

slide-28
SLIDE 28

Factorizing Personalized Markov Chains for Next-Basket Recommendation

What if we try to personalize? Now we would have U*I^2 counts to compare Clearly not feasible, so we need to try and estimate/model this quantity (e.g. by matrix factorization)

slide-29
SLIDE 29

Factorizing Personalized Markov Chains for Next-Basket Recommendation

What if we try to personalize?

slide-30
SLIDE 30

Factorizing Personalized Markov Chains for Next-Basket Recommendation

What if we try to personalize?

slide-31
SLIDE 31

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Prediction task:

slide-32
SLIDE 32

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Prediction task:

slide-33
SLIDE 33

Factorizing Personalized Markov Chains for Next-Basket Recommendation

F@5

FMC: not personalized MF: personalized, but not sequentially-aware

slide-34
SLIDE 34

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Morals of the story:

  • Can improve performance by modeling third
  • rder interactions between the user, the item, and

the previous item

  • This is simpler than temporal models – but makes a

big assumption

  • Given the blowup in the interaction space, this can

be handled by tensor decomposition techniques

slide-35
SLIDE 35

Personalized Ranking Metric Embedding for Next New POI Recommendation

slide-36
SLIDE 36

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Goal: Can we build better sequential recommendation models by using the idea of metric embeddings

vs.

slide-37
SLIDE 37

Personalized Ranking Metric Embedding for Next New POI Recommendation

Why would we expect this to work (or not)?

slide-38
SLIDE 38

Personalized Ranking Metric Embedding for Next New POI Recommendation

Otherwise, goal is the same as the previous paper:

slide-39
SLIDE 39

Personalized Ranking Metric Embedding for Next New POI Recommendation

Data

slide-40
SLIDE 40

Personalized Ranking Metric Embedding for Next New POI Recommendation

Qualitative analysis

slide-41
SLIDE 41

Personalized Ranking Metric Embedding for Next New POI Recommendation

Qualitative analysis

slide-42
SLIDE 42

Personalized Ranking Metric Embedding for Next New POI Recommendation

Basic model (not personalized)

slide-43
SLIDE 43

Personalized Ranking Metric Embedding for Next New POI Recommendation

Basic model (not personalized)

slide-44
SLIDE 44

Personalized Ranking Metric Embedding for Next New POI Recommendation

Personalized version

slide-45
SLIDE 45

Personalized Ranking Metric Embedding for Next New POI Recommendation

Personalized version

slide-46
SLIDE 46

Personalized Ranking Metric Embedding for Next New POI Recommendation

Learning

slide-47
SLIDE 47

Personalized Ranking Metric Embedding for Next New POI Recommendation

Results

slide-48
SLIDE 48

Personalized Ranking Metric Embedding for Next New POI Recommendation

Morals of the story:

  • In some applications, metric embeddings might

be better than inner products

  • Examples could include geographical data, but also
  • thers (e.g. playlists?)
slide-49
SLIDE 49

Translation-based Recommendation

slide-50
SLIDE 50

Goal: (e.g) which movie is this user going to watch next?

viewing history of

Want models that consider

  • characteristics/preferences of each user
  • local context, i.e., the last consumed item(s)

Translation-based Recommendation

slide-51
SLIDE 51

Option 1: Matrix Factorization

Translation-based Recommendation

Goal: (e.g) which movie is this user going to watch next?

viewing history of

slide-52
SLIDE 52

Translation-based Recommendation

Goal: (e.g) which movie is this user going to watch next?

viewing history of Option 2: Markov Chains

slide-53
SLIDE 53

Idea: Considering the two simultaneously means modeling the interactions between a user and adjacent items

previous item next item user

transition

Translation-based Recommendation

slide-54
SLIDE 54

user preference local context

Translation-based Recommendation

Compare: Factorized Personalized Markov Chains (earlier today)

slide-55
SLIDE 55

an additional hyperpara. to balance the two components

Translation-based Recommendation

Compare: Personalized Ranking Metric Embedding (earlier today)

slide-56
SLIDE 56

average/max pooling, etc.

Translation-based Recommendation

Compare: Hierarchical Representation Model (HRM) Wang et al., 2015 (earlier today)

slide-57
SLIDE 57

average/max pooling, etc.

Translation-based Recommendation

Compare: Hierarchical Representation Model (HRM) Wang et al., 2015 (earlier today) Goal: Try and get the “best of both worlds,” by modeling third-order interactions and using metric embeddings

slide-58
SLIDE 58

Detour: Translation models in Knowledge Bases

Data: entities; links (multiple types of relationships) State-of-the-art method: ‘relationships as translations’ Goal: Predict unseen links

E.g. [Bordes et al., 2013], [Wang et al., 2014], [Lin et al., 2015] entity h entity t

Training example:

relation r

Basic idea:

Translation-based Recommendation

slide-59
SLIDE 59

Users as translation vectors

previous item next item user

Items as points Objective: Embedding space Training triplet:

Translation-based Recommendation

slide-60
SLIDE 60

Translation-based Recommendation

Users as translation vectors Items as points Embedding space

slide-61
SLIDE 61

bias

  • Benefit from using metric embeddings
  • Model (u, i, j) with a single component
  • Recommendations can be made by a simple NN search

Translation-based Recommendation

slide-62
SLIDE 62

Translation-based Recommendation

slide-63
SLIDE 63
  • Automotives
  • Office Products
  • Toys & Games
  • Video Games
  • Cell Phones & Accessories
  • Clothing, Shoes, and Jewelry
  • Electronics

May 1996 - July 2014

Translation-based Recommendation

slide-64
SLIDE 64

check-ins at different venues user reviews

  • Dec. 2011 - Apr. 2012
  • Jan. 2001 - Nov. 2013

movie ratings

  • Nov. 2005 - Nov. 2009

(all available

  • nline)

Translation-based Recommendation

slide-65
SLIDE 65

11.4M reviews & ratings

  • f 4.5M users
  • n 3.1M local businesses

restaurants, hotels, parks, shopping malls, movie theaters, schools, military recruiting offices, bird control, mediation services ...

Characteristics: vast vocabulary of items, variability, and sparsity http://cseweb.ucsd.edu/~jmcauley/

Translation-based Recommendation

slide-66
SLIDE 66

Translation-based Recommendation

slide-67
SLIDE 67

varying sparsity

Translation-based Recommendation

slide-68
SLIDE 68

Unified

Translation-based Recommendation

slide-69
SLIDE 69

TransRec

Translation-based Recommendation

slide-70
SLIDE 70

Translation-based Recommendation

Works well with… Doesn’t work well with…

slide-71
SLIDE 71

Overview

Morals of the story:

  • Today we looked at two main ideas that extend the

recommender systems we saw in class:

  • 1. Sequential Recommendation: Most of the

dynamics due to time can be captured purely by knowing the sequence of items

  • 2. Metric Recommendation: In some settings, using

inner products may not be the correct assumption

slide-72
SLIDE 72

Assignment 1

slide-73
SLIDE 73

Assignment 1

slide-74
SLIDE 74

CSE 258

Web Mining and Recommender Systems

Real-world applications of recommender systems

slide-75
SLIDE 75

Recommending product sizes to customers

slide-76
SLIDE 76

Recommending product sizes to customers Goal: Build a recommender system that predicts whether an item will “fit”:

slide-77
SLIDE 77

Recommending product sizes to customers Challenges:

  • Data sparsity: people have very few

purchases from which to estimate size

  • Cold-start: How to handle new

customers and products with no past purchases?

  • Multiple personas: Several customers

may use the same account

slide-78
SLIDE 78

Recommending product sizes to customers Data:

  • Shoe transactions from Amazon.com
  • For each shoe j, we have a reported size

c_j (from the manufacturer), but this may not be correct!

  • Need to estimate the customer’s size (s_i),

as well as the product’s true size (t_j)

slide-79
SLIDE 79

Recommending product sizes to customers Loss function:

slide-80
SLIDE 80

Recommending product sizes to customers Loss function:

slide-81
SLIDE 81

Recommending product sizes to customers Loss function:

slide-82
SLIDE 82

Recommending product sizes to customers

slide-83
SLIDE 83

Recommending product sizes to customers Loss function:

slide-84
SLIDE 84

Recommending product sizes to customers Model fitting:

slide-85
SLIDE 85

Recommending product sizes to customers Extensions:

  • Multi-dimensional sizes
  • Customer and product features
  • User personas
slide-86
SLIDE 86

Recommending product sizes to customers Experiments:

slide-87
SLIDE 87

Recommending product sizes to customers Experiments: Online A/B test

slide-88
SLIDE 88

Playlist prediction via Metric Embedding

slide-89
SLIDE 89

Playlist prediction via Metric Embedding Goal: Build a recommender system that recommends sequences of songs Idea: Might also use a metric embedding (consecutive songs should be “nearby” in some space)

slide-90
SLIDE 90

Playlist prediction via Metric Embedding Basic model:

(compare with metric model from last lecture)

slide-91
SLIDE 91

Playlist prediction via Metric Embedding Basic model (“single point”):

slide-92
SLIDE 92

Playlist prediction via Metric Embedding “Dual-point” model

slide-93
SLIDE 93

Playlist prediction via Metric Embedding Extensions:

  • Popularity biases
slide-94
SLIDE 94

Playlist prediction via Metric Embedding Extensions:

  • Personalization
slide-95
SLIDE 95

Playlist prediction via Metric Embedding Extensions:

  • Semantic Tags
slide-96
SLIDE 96

Playlist prediction via Metric Embedding Extensions:

  • Observable Features
slide-97
SLIDE 97

Playlist prediction via Metric Embedding Experiments:

Yes.com playlists

  • Dec 2010 – May 2011

“Small” dataset:

  • 3,168 songs
  • 134,431 + 1,191,279 transitions

“Large” dataset

  • 9,775 songs
  • 172,510 transitions + 1,602,079 transitions
slide-98
SLIDE 98

Playlist prediction via Metric Embedding Experiments:

slide-99
SLIDE 99

Playlist prediction via Metric Embedding Experiments:

Small Big

slide-100
SLIDE 100

Efficient Natural Language Response Suggestion for Smart Reply

slide-101
SLIDE 101

Efficient Natural Language Response Suggestion for Smart Reply Goal: Automatically suggest common responses to e-mails

slide-102
SLIDE 102

Efficient Natural Language Response Suggestion for Smart Reply Basic setup

slide-103
SLIDE 103

Efficient Natural Language Response Suggestion for Smart Reply Previous solution (KDD 2016)

  • Based on a seq2seq method
slide-104
SLIDE 104

Efficient Natural Language Response Suggestion for Smart Reply Idea: Replace this (complex) solution with a simple multiclass classification-based solution

slide-105
SLIDE 105

Efficient Natural Language Response Suggestion for Smart Reply Idea: Replace this (complex) solution with a simple multiclass classification-based solution

slide-106
SLIDE 106

Efficient Natural Language Response Suggestion for Smart Reply Model: S(x,y)

slide-107
SLIDE 107

Efficient Natural Language Response Suggestion for Smart Reply Model: Architecture v1

slide-108
SLIDE 108

Efficient Natural Language Response Suggestion for Smart Reply Model: Architecture v2

slide-109
SLIDE 109

Efficient Natural Language Response Suggestion for Smart Reply Model: Extensions

slide-110
SLIDE 110

Efficient Natural Language Response Suggestion for Smart Reply Model: Extensions

slide-111
SLIDE 111

Efficient Natural Language Response Suggestion for Smart Reply Experiments: (offline)

slide-112
SLIDE 112

Efficient Natural Language Response Suggestion for Smart Reply Experiments: (online)

slide-113
SLIDE 113

Personalized Itinerary Recommendation with Queuing Time Awareness

slide-114
SLIDE 114

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

slide-115
SLIDE 115

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences Goal: Identify items that might be purchased together

slide-116
SLIDE 116

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

slide-117
SLIDE 117

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

browsed together (substitutable) bought together (complementary)

slide-118
SLIDE 118

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

Four types of relationship: 1) People who viewed X also viewed Y 2) People who viewed X eventually bought Y 3) People who bought X also bought Y 4) People bought X and Y together Substitutes (1 and 2), and Complements (3 and 4)

slide-119
SLIDE 119

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

1) Data collection

slide-120
SLIDE 120

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

2) Training data generation

slide-121
SLIDE 121

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

2) Training

(simpler models)

slide-122
SLIDE 122

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

2) Training

(simpler models)

slide-123
SLIDE 123

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

3) Siamese CNNs

slide-124
SLIDE 124

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

4) Recommendation

slide-125
SLIDE 125

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

slide-126
SLIDE 126

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences