[PPT] - CSE 258 Web Mining and Recommender Systems Advanced Recommender PowerPoint Presentation

SLIDE 1

CSE 258

Web Mining and Recommender Systems

Advanced Recommender Systems

SLIDE 2

This week

Methodological papers

Bayesian Personalized Ranking
Factorizing Personalized Markov Chains
Personalized Ranking Metric Embedding
Translation-based Recommendation

SLIDE 3

This week

Goals:

SLIDE 4

This week

Application papers (Wednesday)

Recommending Product Sizes to Customers
Playlist prediction via Metric Embedding
Efficient Natural Language Response Suggestion for

Smart Reply

Personalized Itinerary Recommendation with Queuing

Time Awareness

Learning Visual Clothing Style with Heterogeneous

Dyadic Co-occurrences

SLIDE 5

This week

We (hopefully?) know enough by now to…

Read academic papers on Recommender

Systems

Understand most of the models and

evaluations used See also – CSE291

SLIDE 6

Bayesian Personalized Ranking

SLIDE 7

Bayesian Personalized Ranking Goal: Estimate a personalized ranking function for each user

SLIDE 8

Bayesian Personalized Ranking

Why? Compare to “traditional” approach of replacing “missing values” by 0: But! “0”s aren’t necessarily negative!

SLIDE 9

Bayesian Personalized Ranking

Why? Compare to “traditional” approach of replacing “missing values” by 0: This suggests a possible solution based on ranking

SLIDE 10

Bayesian Personalized Ranking

Defn: AUC (for a user u)

scoring function that compares an item i to an item j for a user u

The AUC essentially counts how many times the model correctly identifies that u prefers the item they bought (positive feedback) over the item they did not

( )

SLIDE 11

Bayesian Personalized Ranking

Defn: AUC (for a user u) AUC = 1: We always guess correctly among two potential items i and j AUC = 0.5: We guess no better than random

SLIDE 12

Bayesian Personalized Ranking

Defn: AUC = Area Under Precision Recall Curve

SLIDE 13

Bayesian Personalized Ranking

Summary: Goal is to count how many times we identified i as being more preferable than j for a user u

SLIDE 14

Bayesian Personalized Ranking

Summary: Goal is to count how many times we identified i as being more preferable than j for a user u

SLIDE 15

Bayesian Personalized Ranking

Idea: Replace the counting function by a smooth function

is any function that compares the compatibility of i and j for a user u e.g. could be based on matrix factorization:

SLIDE 16

Bayesian Personalized Ranking

Idea: Replace the counting function by a smooth function

SLIDE 17

Bayesian Personalized Ranking

Idea: Replace the counting function by a smooth function

SLIDE 18

Bayesian Personalized Ranking

Experiments:

RossMann (online drug store)
Netflix (treated as a binary problem)

SLIDE 19

Bayesian Personalized Ranking

Experiments:

SLIDE 20

Bayesian Personalized Ranking

Morals of the story:

Given a “one-class” prediction task (like purchase

prediction) we might want to optimize a ranking function rather than trying to factorize a matrix directly

The AUC is one such measure that counts among a

users u, items they consumed i, and items they did not consume, j, how often we correctly guessed that i was preferred by u

We can optimize this approximately by maximizing

where

SLIDE 21

Factorizing Personalized Markov Chains for Next-Basket Recommendation

SLIDE 22

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Goal: build temporal models just by looking at the item the user purchased previously

(or )

SLIDE 23

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Assumption: all of the information contained by temporal models is captured by the previous action this is what’s known as a first-order Markov property

SLIDE 24

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Is this assumption realistic?

SLIDE 25

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Data setup: Rossmann basket data

SLIDE 26

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Prediction task:

SLIDE 27

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Could we try and compute such probabilities just by counting? Seems okay, as long as the item vocabulary is small (I^2 possible item/item combinations to count) But it’s not personalized

SLIDE 28

Factorizing Personalized Markov Chains for Next-Basket Recommendation

What if we try to personalize? Now we would have U*I^2 counts to compare Clearly not feasible, so we need to try and estimate/model this quantity (e.g. by matrix factorization)

SLIDE 29

Factorizing Personalized Markov Chains for Next-Basket Recommendation

What if we try to personalize?

SLIDE 30

Factorizing Personalized Markov Chains for Next-Basket Recommendation

What if we try to personalize?

SLIDE 31

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Prediction task:

SLIDE 32

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Prediction task:

SLIDE 33

Factorizing Personalized Markov Chains for Next-Basket Recommendation

F@5

FMC: not personalized MF: personalized, but not sequentially-aware

SLIDE 34

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Morals of the story:

Can improve performance by modeling third
rder interactions between the user, the item, and

the previous item

This is simpler than temporal models – but makes a

big assumption

Given the blowup in the interaction space, this can

be handled by tensor decomposition techniques

SLIDE 35

Personalized Ranking Metric Embedding for Next New POI Recommendation

SLIDE 36

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Goal: Can we build better sequential recommendation models by using the idea of metric embeddings

vs.

SLIDE 37

Personalized Ranking Metric Embedding for Next New POI Recommendation

Why would we expect this to work (or not)?

SLIDE 38

Personalized Ranking Metric Embedding for Next New POI Recommendation

Otherwise, goal is the same as the previous paper:

SLIDE 39

Personalized Ranking Metric Embedding for Next New POI Recommendation

Data

SLIDE 40

Personalized Ranking Metric Embedding for Next New POI Recommendation

Qualitative analysis

SLIDE 41

Personalized Ranking Metric Embedding for Next New POI Recommendation

Qualitative analysis

SLIDE 42

Personalized Ranking Metric Embedding for Next New POI Recommendation

Basic model (not personalized)

SLIDE 43

Personalized Ranking Metric Embedding for Next New POI Recommendation

Basic model (not personalized)

SLIDE 44

Personalized Ranking Metric Embedding for Next New POI Recommendation

Personalized version

SLIDE 45

Personalized Ranking Metric Embedding for Next New POI Recommendation

Personalized version

SLIDE 46

Personalized Ranking Metric Embedding for Next New POI Recommendation

Learning

SLIDE 47

Personalized Ranking Metric Embedding for Next New POI Recommendation

Results

SLIDE 48

Personalized Ranking Metric Embedding for Next New POI Recommendation

Morals of the story:

In some applications, metric embeddings might

be better than inner products

Examples could include geographical data, but also
thers (e.g. playlists?)

SLIDE 49

Translation-based Recommendation

SLIDE 50

Goal: (e.g) which movie is this user going to watch next?

viewing history of

Want models that consider

characteristics/preferences of each user
local context, i.e., the last consumed item(s)

Translation-based Recommendation

SLIDE 51

Option 1: Matrix Factorization

Translation-based Recommendation

Goal: (e.g) which movie is this user going to watch next?

viewing history of

SLIDE 52

Translation-based Recommendation

Goal: (e.g) which movie is this user going to watch next?

viewing history of Option 2: Markov Chains

SLIDE 53

Idea: Considering the two simultaneously means modeling the interactions between a user and adjacent items

previous item next item user

transition

Translation-based Recommendation

SLIDE 54

user preference local context

Translation-based Recommendation

Compare: Factorized Personalized Markov Chains (earlier today)

SLIDE 55

an additional hyperpara. to balance the two components

Translation-based Recommendation

Compare: Personalized Ranking Metric Embedding (earlier today)

SLIDE 56

average/max pooling, etc.

Translation-based Recommendation

Compare: Hierarchical Representation Model (HRM) Wang et al., 2015 (earlier today)

SLIDE 57

average/max pooling, etc.

Translation-based Recommendation

Compare: Hierarchical Representation Model (HRM) Wang et al., 2015 (earlier today) Goal: Try and get the “best of both worlds,” by modeling third-order interactions and using metric embeddings

SLIDE 58

Detour: Translation models in Knowledge Bases

Data: entities; links (multiple types of relationships) State-of-the-art method: ‘relationships as translations’ Goal: Predict unseen links

E.g. [Bordes et al., 2013], [Wang et al., 2014], [Lin et al., 2015] entity h entity t

Training example:

relation r

Basic idea:

Translation-based Recommendation

SLIDE 59

Users as translation vectors

previous item next item user

Items as points Objective: Embedding space Training triplet:

Translation-based Recommendation

SLIDE 60

Translation-based Recommendation

Users as translation vectors Items as points Embedding space

SLIDE 61

bias

Benefit from using metric embeddings
Model (u, i, j) with a single component
Recommendations can be made by a simple NN search

Translation-based Recommendation

SLIDE 62

Translation-based Recommendation

SLIDE 63

Automotives
Office Products
Toys & Games
Video Games
Cell Phones & Accessories
Clothing, Shoes, and Jewelry
Electronics

May 1996 - July 2014

Translation-based Recommendation

SLIDE 64

check-ins at different venues user reviews

Dec. 2011 - Apr. 2012
Jan. 2001 - Nov. 2013

movie ratings

Nov. 2005 - Nov. 2009

(all available

nline)

Translation-based Recommendation

SLIDE 65

11.4M reviews & ratings

f 4.5M users
n 3.1M local businesses

restaurants, hotels, parks, shopping malls, movie theaters, schools, military recruiting offices, bird control, mediation services ...

Characteristics: vast vocabulary of items, variability, and sparsity http://cseweb.ucsd.edu/~jmcauley/

Translation-based Recommendation

SLIDE 66

Translation-based Recommendation

SLIDE 67

varying sparsity

Translation-based Recommendation

SLIDE 68

Unified

Translation-based Recommendation

SLIDE 69

TransRec

Translation-based Recommendation

SLIDE 70

Translation-based Recommendation

Works well with… Doesn’t work well with…

SLIDE 71

Overview

Morals of the story:

Today we looked at two main ideas that extend the

recommender systems we saw in class:

1. Sequential Recommendation: Most of the

dynamics due to time can be captured purely by knowing the sequence of items

2. Metric Recommendation: In some settings, using

inner products may not be the correct assumption

SLIDE 72

Assignment 1

SLIDE 73

Assignment 1

SLIDE 74

CSE 258

Web Mining and Recommender Systems

Real-world applications of recommender systems

SLIDE 75

Recommending product sizes to customers

SLIDE 76

Recommending product sizes to customers Goal: Build a recommender system that predicts whether an item will “fit”:

SLIDE 77

Recommending product sizes to customers Challenges:

Data sparsity: people have very few

purchases from which to estimate size

Cold-start: How to handle new

customers and products with no past purchases?

Multiple personas: Several customers

may use the same account

SLIDE 78

Recommending product sizes to customers Data:

Shoe transactions from Amazon.com
For each shoe j, we have a reported size

c_j (from the manufacturer), but this may not be correct!

Need to estimate the customer’s size (s_i),

as well as the product’s true size (t_j)

SLIDE 79

Recommending product sizes to customers Loss function:

SLIDE 80

Recommending product sizes to customers Loss function:

SLIDE 81

Recommending product sizes to customers Loss function:

SLIDE 82

Recommending product sizes to customers

SLIDE 83

Recommending product sizes to customers Loss function:

SLIDE 84

Recommending product sizes to customers Model fitting:

SLIDE 85

Recommending product sizes to customers Extensions:

Multi-dimensional sizes
Customer and product features
User personas

SLIDE 86

Recommending product sizes to customers Experiments:

SLIDE 87

Recommending product sizes to customers Experiments: Online A/B test

SLIDE 88

Playlist prediction via Metric Embedding

SLIDE 89

Playlist prediction via Metric Embedding Goal: Build a recommender system that recommends sequences of songs Idea: Might also use a metric embedding (consecutive songs should be “nearby” in some space)

SLIDE 90

Playlist prediction via Metric Embedding Basic model:

(compare with metric model from last lecture)

SLIDE 91

Playlist prediction via Metric Embedding Basic model (“single point”):

SLIDE 92

Playlist prediction via Metric Embedding “Dual-point” model

SLIDE 93

Playlist prediction via Metric Embedding Extensions:

Popularity biases

SLIDE 94

Playlist prediction via Metric Embedding Extensions:

Personalization

SLIDE 95

Playlist prediction via Metric Embedding Extensions:

Semantic Tags

SLIDE 96

Playlist prediction via Metric Embedding Extensions:

Observable Features

SLIDE 97

Playlist prediction via Metric Embedding Experiments:

Yes.com playlists

Dec 2010 – May 2011

“Small” dataset:

3,168 songs
134,431 + 1,191,279 transitions

“Large” dataset

9,775 songs
172,510 transitions + 1,602,079 transitions

SLIDE 98

Playlist prediction via Metric Embedding Experiments:

SLIDE 99

Playlist prediction via Metric Embedding Experiments:

Small Big

SLIDE 100

Efficient Natural Language Response Suggestion for Smart Reply

SLIDE 101

Efficient Natural Language Response Suggestion for Smart Reply Goal: Automatically suggest common responses to e-mails

SLIDE 102

Efficient Natural Language Response Suggestion for Smart Reply Basic setup

SLIDE 103

Efficient Natural Language Response Suggestion for Smart Reply Previous solution (KDD 2016)

Based on a seq2seq method

SLIDE 104

Efficient Natural Language Response Suggestion for Smart Reply Idea: Replace this (complex) solution with a simple multiclass classification-based solution

SLIDE 105

Efficient Natural Language Response Suggestion for Smart Reply Idea: Replace this (complex) solution with a simple multiclass classification-based solution

SLIDE 106

Efficient Natural Language Response Suggestion for Smart Reply Model: S(x,y)

SLIDE 107

Efficient Natural Language Response Suggestion for Smart Reply Model: Architecture v1

SLIDE 108

Efficient Natural Language Response Suggestion for Smart Reply Model: Architecture v2

SLIDE 109

Efficient Natural Language Response Suggestion for Smart Reply Model: Extensions

SLIDE 110

Efficient Natural Language Response Suggestion for Smart Reply Model: Extensions

SLIDE 111

Efficient Natural Language Response Suggestion for Smart Reply Experiments: (offline)

SLIDE 112

Efficient Natural Language Response Suggestion for Smart Reply Experiments: (online)

SLIDE 113

Personalized Itinerary Recommendation with Queuing Time Awareness

SLIDE 114

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

SLIDE 115

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences Goal: Identify items that might be purchased together

SLIDE 116

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

SLIDE 117

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

browsed together (substitutable) bought together (complementary)

SLIDE 118

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

Four types of relationship: 1) People who viewed X also viewed Y 2) People who viewed X eventually bought Y 3) People who bought X also bought Y 4) People bought X and Y together Substitutes (1 and 2), and Complements (3 and 4)

SLIDE 119

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

1) Data collection

SLIDE 120

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

2) Training data generation

SLIDE 121

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

2) Training

(simpler models)

SLIDE 122

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

2) Training

(simpler models)

SLIDE 123

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

3) Siamese CNNs

SLIDE 124

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

4) Recommendation

SLIDE 125

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

SLIDE 126