Recommendation Systems Stony Brook University CSE545, Spring 2019 - - PowerPoint PPT Presentation

recommendation systems
SMART_READER_LITE
LIVE PREVIEW

Recommendation Systems Stony Brook University CSE545, Spring 2019 - - PowerPoint PPT Presentation

Recommendation Systems Stony Brook University CSE545, Spring 2019 Recommendation Systems What other item will this user like? (based on previously liked items) How much will user like item X? Recommendation Systems What other item


slide-1
SLIDE 1

Recommendation Systems

Stony Brook University CSE545, Spring 2019

slide-2
SLIDE 2
  • What other item will this user like?

(based on previously liked items)

  • How much will user like item X?

Recommendation Systems

slide-3
SLIDE 3

?

  • What other item will this user like?

(based on previously liked items)

  • How much will user like item X?

Recommendation Systems

slide-4
SLIDE 4
  • What other item will this user like?

(based on previously liked items)

  • How much will user like item X?

Recommendation Systems

slide-5
SLIDE 5

Recommendation Systems

slide-6
SLIDE 6

Past User Ratings

Recommendation Systems

slide-7
SLIDE 7

Recommendation Systems

Why Big Data?

  • Data with many potential features (and sometimes
  • bservations)
  • An application of techniques for finding similar items

○ locality sensitive hashing ○ dimensionality reduction

slide-8
SLIDE 8

Recommendation System: Example

slide-9
SLIDE 9
slide-10
SLIDE 10

Enabled by Web Shopping

  • Does Wal-Mart have everything you need?
slide-11
SLIDE 11

Enabled by Web Shopping

  • Does Wal-Mart have everything you need?

(thelongtail.com)

slide-12
SLIDE 12

Enabled by Web Shopping

  • Does Wal-Mart have everything you need?
  • A lot of products are only of interest to

a small population (i.e. “long-tail products”).

  • However, most people buy many products

that are from the long-tail.

  • Web shopping enables more choices

○ Harder to search ○ Recommendation engines to the rescue

(thelongtail.com)

slide-13
SLIDE 13

Enabled by Web Shopping

  • Does Wal-Mart have everything you need?
  • A lot of products are only of interest to

a small population (i.e. “long-tail products”).

  • However, most people buy many products

that are from the long-tail.

  • Web shopping enables more choices

○ Harder to search ○ Recommendation engines to the rescue

(thelongtail.com)

slide-14
SLIDE 14

A Model for Recommendation Systems

Given: users, items, utility matrix

slide-15
SLIDE 15

A Model for Recommendation Systems

Given: users, items, utility matrix

user Game of Thrones Fargo Brooklyn Nine-Nine Silicon Valley Walking Dead A 4 5 3 3 B 5 4 2 C 5 2

slide-16
SLIDE 16

A Model for Recommendation Systems

Given: users, items, utility matrix

user Game of Thrones Fargo Brooklyn Nine-Nine Silicon Valley Walking Dead A 4 5 3 3 B 5 4 2 C 5 2

? ? ?

slide-17
SLIDE 17

Recommendation Systems

Problems to tackle:

1. Gathering ratings 2. Extrapolate unknown ratings a. Explicit: based on user ratings and reviews (problem: only a few users engage in such tasks) b. Implicit: Learn from actions (e.g. purchases, clicks) (problem: hard to learn low ratings) 3. Evaluation

slide-18
SLIDE 18

Recommendation Systems

Problems to tackle:

1. Gathering ratings 2. Extrapolate unknown ratings a. Explicit: based on user ratings and reviews (problem: only a few users engage in such tasks) b. Implicit: Learn from actions (e.g. purchases, clicks) (problem: hard to learn low ratings) 3. Evaluation

Common Approaches

  • 1. Content-based
  • 2. Collaborative
  • 3. Latent Factor
slide-19
SLIDE 19

Recommendation Systems

Problems to tackle:

1. Gathering ratings 2. Extrapolate unknown ratings a. Explicit: based on user ratings and reviews (problem: only a few users engage in such tasks) b. Implicit: Learn from actions (e.g. purchases, clicks) (problem: hard to learn low ratings) 3. Evaluation

Common Approaches

  • 1. Content-based
  • 2. Collaborative
  • 3. Latent Factor
slide-20
SLIDE 20

Utility Matrix:

f1, f2, f3, f4, … fp

  • 1
  • 2
  • 3

  • N

columns: p features rows: N observations

users movies

slide-21
SLIDE 21

Goal: Complete Matrix

f1, f2, f3, f4, … fp

  • 1
  • 2
  • 3

  • N

users movies

slide-22
SLIDE 22

Problem: Given Incomplete Matrix

f1, f2, f3, f4, … fp

  • 1
  • 2
  • 3

  • N

users movies

slide-23
SLIDE 23

Complete Matrix using Latent Factors

c1, c2, c3, c4, … cp’

  • 1
  • 2
  • 3

  • N

f1, f2, f3, f4, … fp

  • 1
  • 2
  • 3

  • N

Try to best represent but with on p’ columns.

Dimensionality reduction

slide-24
SLIDE 24

Complete Matrix using Latent Factors

Find latent factors Reconstruct matrix

slide-25
SLIDE 25

Dimensionality Reduction - PCA

Linear approximates of data in dimensions. Found via Singular Value Decomposition:

X[nxp] = U[nxr] D[rxr] V[pxr]

T

X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors” Projection (dimensionality reduced space) in 3 dimensions: (U[nx3] D[3x3] V[px3]

T)

To reduce features in new dataset: Xnew V = Xnew_small

slide-26
SLIDE 26

Dimensionality Reduction - PCA

Linear approximates of data in dimensions. Found via Singular Value Decomposition:

X[nxp] = U[nxr] D[rxr] V[pxr]

T

X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors”

X

n n p p

slide-27
SLIDE 27

Dimensionality Reduction - PCA - Example

X[nxp] = U[nxr] D[rxr] V[pxr]

T

Users to movies matrix

slide-28
SLIDE 28

Dimensionality Reduction - PCA

Linear approximates of data in dimensions. Found via Singular Value Decomposition:

X[nxp] = U[nxr] D[rxr] V[pxr]

T

X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors”

X

n n p p

slide-29
SLIDE 29

Dimensionality Reduction - PCA

Linear approximates of data in dimensions. Found via Singular Value Decomposition:

X[nxp] = U[nxr] D[rxr] V[pxr]

T

X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors”

To check how well the original matrix can be reproduced: Z[nxp] = U D VT , How does Z compare to original X?

slide-30
SLIDE 30

Dimensionality Reduction - PCA - Example

X[nxp] = U[nxr] D[rxr] V[pxr]

T

slide-31
SLIDE 31

Recommendation Systems

Problems to tackle:

1. Gathering ratings 2. Extrapolate unknown ratings a. Explicit: based on user ratings and reviews (problem: only a few users engage in such tasks) b. Implicit: Learn from actions (e.g. purchases, clicks) (problem: hard to learn low ratings) 3. Evaluation

Common Approaches

  • 1. Content-based
  • 2. Collaborative
  • 3. Latent Factor
slide-32
SLIDE 32

Recommendation Systems

Problems to tackle:

1. Gathering ratings 2. Extrapolate unknown ratings a. Explicit: based on user ratings and reviews (problem: only a few users engage in such tasks) b. Implicit: Learn from actions (e.g. purchases, clicks) (problem: hard to learn low ratings) 3. Evaluation

Common Approaches

  • 1. Content-based
  • 2. Collaborative
  • 3. Latent Factor
slide-33
SLIDE 33

Content-based Rec Systems

Based on similarity of items to past items that they have rated.

slide-34
SLIDE 34

Content-based Rec Systems

Based on similarity of items to past items that they have rated.

slide-35
SLIDE 35

Content-based Rec Systems

Based on similarity of items to past items that they have rated. 1. Build profiles of items (set of features); examples: shows: producer, actors, theme, review people: friends, posts

pick words with tf-idf

slide-36
SLIDE 36

Content-based Rec Systems

Based on similarity of items to past items that they have rated. 1. Build profiles of items (set of features); examples: shows: producer, actors, theme, review people: friends, posts 2. Construct user profile from item profiles; approach: average all item profiles variation: weight by difference from their average

pick words with tf-idf

slide-37
SLIDE 37

Content-based Rec Systems

Based on similarity of items to past items that they have rated. 1. Build profiles of items (set of features); examples: shows: producer, actors, theme, review people: friends, posts 2. Construct user profile from item profiles; approach: average all item profiles of items they’ve purchased variation: weight by difference from their average ratings 3. Predict ratings for new items; approach:

pick words with tf-idf x i

slide-38
SLIDE 38

Why Content Based?

  • Only need users history
  • Captures unique tastes
  • Can recommend new items
  • Can provide explanations
slide-39
SLIDE 39

Why Content Based?

  • Only need users history
  • Captures unique tastes
  • Can recommend new items
  • Can provide explanations
  • Need good features
  • New users don’t have history
  • Doesn’t venture “outside the box”

(Overspecialized)

slide-40
SLIDE 40

Why Content Based?

  • Only need users history
  • Captures unique tastes
  • Can recommend new items
  • Can provide explanations
  • Need good features
  • New users don’t have history
  • Doesn’t venture “outside the box”

(Overspecialized) (not exploiting other users judgments)

slide-41
SLIDE 41

Collaborative Filtering Rec Systems

  • Need good features
  • New users don’t have history
  • Doesn’t venture “outside the box”

(Overspecialized) (not exploiting other users judgments)

slide-42
SLIDE 42

Recommendation Systems

Problems to tackle:

1. Gathering ratings 2. Extrapolate unknown ratings a. Explicit: based on user ratings and reviews (problem: only a few users engage in such tasks) b. Implicit: Learn from actions (e.g. purchases, clicks) (problem: hard to learn low ratings) 3. Evaluation

Common Approaches

  • 1. Content-based
  • 2. Collaborative
  • 3. Latent Factor
slide-43
SLIDE 43

Collaborative Filtering Rec Systems

  • - neighborhood
slide-44
SLIDE 44

Collaborative Filtering Rec Systems

user Game of Thrones Fargo Brooklyn Nine-Nine Silicon Valley Walking Dead A 4 5 2 3 B 5 4 2 C 5 2

slide-45
SLIDE 45

Collaborative Filtering Rec Systems

user Game of Thrones Fargo Brooklyn Nine-Nine Silicon Valley Walking Dead A 4 5 2 3 B 5 4 2 C 5 2

General Idea: 1) Find similar users = “neighborhood”

2) Infer rating based on how similar users rated

slide-46
SLIDE 46

Collaborative Filtering Rec Systems

user Game of Thrones Fargo Brooklyn Nine-Nine Silicon Valley Walking Dead A 4 5 2 3 B 5 4 2 C 5 2

Given: user, x; item, i; utility matrix, u 1. Find neighborhood, N # set of k users most similar to x who have also rated i

slide-47
SLIDE 47

Collaborative Filtering Rec Systems

user Game of Thrones Fargo Brooklyn Nine-Nine Silicon Valley Walking Dead A 4 5 2 3 B 5 4 2 C 5 2

Given: user, x; item, i; utility matrix, u 1. Find neighborhood, N # set of k users most similar to x who have also rated i Two Challenges: (1) user bias, (2) missing values

slide-48
SLIDE 48

Collaborative Filtering Rec Systems

user Game of Thrones Fargo Brooklyn Nine-Nine Silicon Valley Walking Dead A 4 => 0.5 5 => 1.5 2 => -1.5 => 0 3 => -0.5 B 5 4 2 C 5 2

Given: user, x; item, i; utility matrix, u 1. Find neighborhood, N # set of k users most similar to x who have also rated i Two Challenges: (1) user bias, (2) missing values Solution: subtract user’s mean, add zeros for missing

slide-49
SLIDE 49

Collaborative Filtering Rec Systems

user Game of Thrones Fargo Brooklyn Nine-Nine Silicon Valley Walking Dead A 4 => 0.5 5 => 1.5 2 => -1.5 => 0 3 => -0.5 B 5 4 2 C 5 2

Given: user, x; item, i; utility matrix, u

  • 0. Update u: mean center, missing to 0

1. Find neighborhood, N # set of k users most similar to x who have also rated i

  • - sim(x, other) = cosine_sim(u[x], u[other])
  • - threshold to top k (e.g. k = 30)
slide-50
SLIDE 50

Collaborative Filtering Rec Systems

user Game of Thrones Fargo Brooklyn Nine-Nine Silicon Valley Walking Dead A 4 => 0.5 5 => 1.5 2 => -1.5 => 0 3 => -0.5 B 5 4 2 C 5 2

Given: user, x; item, i; utility matrix, u

  • 0. Update u: mean center, missing to 0

1. Find neighborhood, N # set of k users most similar to x who have also rated i

  • - sim(x, other) = cosine_sim(u[x], u[other])
  • - threshold to top k (e.g. k = 30)
  • 2. Predict utility (rating) of i based on N
slide-51
SLIDE 51

Collaborative Filtering Rec Systems

user Game of Thrones Fargo Brooklyn Nine-Nine Silicon Valley Walking Dead A 4 => 0.5 5 => 1.5 2 => -1.5 => 0 3 => -0.5 B 5 4 2 C 5 2

Given: user, x; item, i; utility matrix, u

  • 0. Update u: mean center, missing to 0

1. Find neighborhood, N # set of k users most similar to x who have also rated i

  • - sim(x, other) = cosine_sim(u[x], u[other])
  • - threshold to top k (e.g. k = 30)
  • 2. Predict utility (rating) of i based on N
  • - average, weighted by sim
slide-52
SLIDE 52

Collaborative Filtering Rec Systems

Given: user, x; item, i; utility matrix, u

  • 0. Update u: mean center, missing to 0

1. Find neighborhood, N # set of k users most similar to x who have also rated i

  • - sim(x, other) = cosine_sim(u[x], u[other])
  • - threshold to top k (e.g. k = 30)
  • 2. Predict utility (rating) of i based on N
  • - average, weighted by sim

“User-User collaborative filtering”

slide-53
SLIDE 53

Collaborative Filtering Rec Systems

Given: user, x; item, i; utility matrix, u

  • 0. Update u: mean center, missing to 0

1. Find neighborhood, N # set of k users most similar to x who have also rated i

  • - sim(x, other) = cosine_sim(u[x], u[other])
  • - threshold to top k (e.g. k = 30)
  • 2. Predict utility (rating) of i based on N
  • - average, weighted by sim

“User-User collaborative filtering”

Item-Item: Flip rows/columns of utility matrix and use same methods. (i.e. estimate rating of item i, by finding similar items, j)

slide-54
SLIDE 54

Collaborative Filtering Rec Systems

Given: user, x; item, i; utility matrix, u

  • 0. Update u: mean center, missing to 0

1. Find neighborhood, N # set of k items most similar to i also rated by x

  • - sim(i, other) = cosine_sim(u[i], u[other])
  • - threshold to top k (e.g. k = 30)
  • 2. Predict utility (rating) by x based on N
  • - average, weighted by sim

“User-User collaborative filtering”

Item-Item: Flip rows/columns of utility matrix and use same methods. (i.e. estimate rating of item i, by finding similar items, j)

slide-55
SLIDE 55

Item-Item v User-User

Item-item often works better than user-user. Why? Users tend to be more different from each other than items are from

  • ther items.

e.g. Mary likes jazz + rock, Bob likes classical + rock, but Mary may still have same rock preferences as Bob

slide-56
SLIDE 56

Item-Item v User-User

Item-item often works better than user-user. Why? Users tend to be more different from each other than items are from

  • ther items.

e.g. Mary likes jazz + rock, Bob likes classical + rock, but Mary may still have same rock preferences as Bob In other words, users span genres but items usually do not.

slide-57
SLIDE 57

Item-Item: Example

slide-58
SLIDE 58

Item-Item: Example

slide-59
SLIDE 59

Item-Item: Example

Same as cosine sim when subtracting the mean

slide-60
SLIDE 60

Item-Item: Example

slide-61
SLIDE 61

Item-Item: Example

utility(1, 5) = (0.41*2 + 0.59*3) / (0.41+0.59)

slide-62
SLIDE 62

Recommendation Systems

Problems to tackle:

1. Gathering ratings 2. Extrapolate unknown ratings a. Explicit: based on user ratings and reviews (problem: only a few users engage in such tasks) b. Implicit: Learn from actions (e.g. purchases, clicks) (problem: hard to learn low ratings) 3. Evaluation

Common Approaches

  • 1. Content-based
  • 2. Collaborative
  • 3. Latent Factor
slide-63
SLIDE 63

Options for Parallelizing

1. Approximate solutions to PCA (very large speedups with little drawback!):

a. Stochastic Sampling (also sometimes called "randomized" which is ambiguous): Only using a sample rows (i.e. users for recommendation systems) b. Truncated SVD: Only optimizing for minimizing reconstruction error based on up to r dimensions (full SVD solves for up to min(n, p) dimensions and then you just truncate the result for the lower rank version). One you do this, by the way, using a smaller sample becomes much less of a problem.

c.

Limiting power iterations to a few iterations: Power iterations from pagerank solves for the first principle component. This can be extended to multiple components.

(more here.) 2. Distribute the matrix operations. Complex; not as flexible (usually done across processors within node) 3. Data Parallelism: As in other instances stochastic or mini-batch gradient descent.