[PPT] - Recommender Systems Jia-Bin Huang Virginia Tech Spring 2019 PowerPoint Presentation

SLIDE 1

Recommender Systems

Jia-Bin Huang Virginia Tech

Spring 2019

ECE-5424G / CS-5824

SLIDE 2

Administrative

HW 4 due April 10

SLIDE 3

Unsupervised Learning

Clustering, K-Mean
Expectation maximization
Dimensionality reduction
Anomaly detection
Recommendation system

SLIDE 4

Motivating example: Monitoring machines in a data center

𝑦1 (CPU load) 𝑦2 (Memory use) 𝑦1 (CPU load) 𝑦2 (Memory use)

SLIDE 5

Multivariate Gaussian (normal) distribution

𝑦 ∈ 𝑆𝑜. Don’t model 𝑞 𝑦1 , 𝑞 𝑦2 , ⋯ separately
Model 𝑞 𝑦 all in one go.
Parameters: 𝜈 ∈ 𝑆𝑜, Σ ∈ 𝑆𝑜×𝑜 (covariance matrix)
𝑞 𝑦; 𝜈, Σ =

1 2𝜌 𝑜/2 Σ 1/2 exp − 𝑦 − 𝜈 ⊤Σ−1(𝑦 − 𝜈)

SLIDE 6

Multivariate Gaussian (normal) examples

Σ = 1 1 Σ = 0.6 0.6 Σ = 2 2 𝑦1 𝑦2 𝑦1 𝑦2 𝑦1 𝑦2

SLIDE 7

Multivariate Gaussian (normal) examples

Σ = 1 1 Σ = 0.6 1 Σ = 2 1 𝑦1 𝑦2 𝑦1 𝑦2 𝑦1 𝑦2

SLIDE 8

Multivariate Gaussian (normal) examples

Σ = 1 1 Σ = 1 0.5 0.5 1 Σ = 1 0.8 0.8 1 𝑦1 𝑦2 𝑦1 𝑦2 𝑦1 𝑦2

SLIDE 9

Anomaly detection using the multivariate Gaussian distribution

1. Fit model 𝑞 𝑦 by setting

𝜈 = 1 𝑛 ෍

𝑗=1 𝑛

𝑦(𝑗) Σ = 1 𝑛 ෍

𝑗=1 𝑛

(𝑦(𝑗)−𝜈)(𝑦(𝑗) − 𝜈)⊤ 2 Give a new example 𝑦, compute 𝑞 𝑦; 𝜈, Σ = 1 2𝜌 𝑜/2 Σ 1/2 exp − 𝑦 − 𝜈 ⊤Σ−1(𝑦 − 𝜈) Flag an anomaly if 𝑞 𝑦 < 𝜗

SLIDE 10

Original model 𝑞 𝑦1; 𝜈1, 𝜏1

2 𝑞 𝑦2; 𝜈2, 𝜏2 2 ⋯ 𝑞 𝑦𝑜; 𝜈𝑜, 𝜏𝑜 2

Manually create features to capture anomalies where 𝑦1, 𝑦2 take unusual combinations of values Computationally cheaper (alternatively, scales better) OK even if training set size is small

Original model

𝑞 𝑦; 𝜈, Σ = 1 2𝜌 𝑜/2 Σ 1/2 exp(− 𝑦 − 𝜈 ⊤Σ−1(𝑦

SLIDE 11

Recommender Systems

Motivation
Problem formulation
Content-based recommendations
Collaborative filtering
Mean normalization

SLIDE 12

Recommender Systems

Motivation
Problem formulation
Content-based recommendations
Collaborative filtering
Mean normalization

SLIDE 13

Recommender Systems

Motivation
Problem formulation
Content-based recommendations
Collaborative filtering
Mean normalization

SLIDE 16

Example: Predicting movie ratings

User rates movies using zero to five stars

Movie Alice (1) Bob (2) Carol (3) Dave (4) Love at last

5 5

Romance forever

5 ? ?

Cute puppies of love

? 4 ?

Nonstop car chases

5 4

Swords vs. karate

5 ?

𝑜𝑣 = no. users
𝑜𝑛 = no. movies
𝑠 𝑗, 𝑘 = 1 if user 𝑘 has

rated movie 𝑗

𝑧(𝑗,𝑘) = rating given by

user 𝑘 to movie 𝑗

SLIDE 17

Recommender Systems

Motivation
Problem formulation
Content-based recommendations
Collaborative filtering
Mean normalization

SLIDE 18

Content-based recommender systems

Movie Alice (1) Bob (2) Carol (3) Dave (4) 𝑦1 (romance) 𝑦2 (action) Love at last

5 5 0.9

Romance forever

5 ? ? 1.0 0.01

Cute puppies

f love

? 4 ? 0.99

Nonstop car chases

5 4 0.1 1.0

Swords vs. karate

5 ? 0.9

For each user 𝑘, learn a parameter 𝜄(𝑘) ∈ 𝑆3. Predict user 𝑘 as rating movie 𝑗 with (𝜄 𝑘 )⊤𝑦(𝑗) stars.

SLIDE 19

Content-based recommender systems

Movie Alice (1) Bob (2) Carol (3) Dave (4) 𝑦1 (romance) 𝑦2 (action) Love at last

5 5 0.9

Romance forever

5 ? ? 1.0 0.01

Cute puppies

f love

? 4 ? 0.99

Nonstop car chases

5 4 0.1 1.0

Swords vs. karate

5 ? 0.9

For each user 𝑘, learn a parameter 𝜄(𝑘) ∈ 𝑆3. Predict user 𝑘 as rating movie 𝑗 with (𝜄 𝑘 )⊤𝑦(𝑗) stars.

𝑦(3) = 1 0.99 𝜄 1 = 5 (𝜄 1 )⊤𝑦(3) = 5 ∗ 0.99 = 4.95

SLIDE 20

Problem formulation

𝑠 𝑗, 𝑘 = 1 if user 𝑘 has rated movie 𝑗
𝑧(𝑗,𝑘) = rating given by user 𝑘 to movie 𝑗
𝜄(𝑘) = parameter vector for user 𝑘
𝑦(𝑗) = feature vector for user 𝑗
For each user 𝑘, predicted rating: (𝜄 𝑘 )⊤𝑦(𝑗)
𝑛(𝑘) = no. of movies rated by user j

Goal: learn 𝜄(𝑘):

min

𝜄(𝑘)

1 2𝑛(𝑘) ෍

𝑗:𝑠 𝑗,𝑘 =1

(𝜄 𝑘 )⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘

2 +

𝜇 2𝑛(𝑘) ෍

𝑙=1 𝑜

𝜄𝑙

𝑘 2

SLIDE 21

Optimization objective

Learn 𝜄 𝑘 (parameter for user 𝑘):

min

𝜄(𝑘)

1 2 ෍

𝑗:𝑠 𝑗,𝑘 =1

(𝜄 𝑘 )⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘

2 + 𝜇

2 ෍

𝑙=1 𝑜

𝜄𝑙

𝑘 2

Learn 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜𝑣 : min

𝜄 1 ,𝜄 2 ,⋯,𝜄 𝑜𝑣

1 2 ෍

𝑘=1 𝑜𝑣

෍

𝑗:𝑠 𝑗,𝑘 =1

(𝜄 𝑘 )⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘

2 + 𝜇

2 ෍

𝑘=1 𝑜𝑣

෍

𝑙=1 𝑜

𝜄𝑙

𝑘 2

SLIDE 22

Optimization algorithm

min

𝜄(𝑘)

1 2 ෍

𝑘=1 𝑜𝑣

෍

𝑗:𝑠 𝑗,𝑘 =1

(𝜄 𝑘 )⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘

2 + 𝜇

2 ෍

𝑘=1 𝑜𝑣

෍

𝑙=1 𝑜

𝜄𝑙

𝑘 2

Gradient descent update:

𝜄𝑙

𝑘 ≔ 𝜄𝑙 𝑘 − 𝛽 σ𝑗:𝑠 𝑗,𝑘 =1

𝜄 𝑘

⊤ 𝑦 𝑗

− 𝑧 𝑗,𝑘 𝑦𝑙

𝑗

(for 𝑙 = 0) 𝜄𝑙

𝑘 ≔ 𝜄𝑙 𝑘 − 𝛽 σ𝑗:𝑠 𝑗,𝑘 =1( 𝜄 𝑘 ⊤ 𝑦 𝑗

− 𝑧 𝑗,𝑘 ) 𝑦𝑙

𝑗 + 𝜇 𝜄𝑙 (𝑘)

(for 𝑙 ≠ 0)

SLIDE 23

Recommender Systems

Motivation
Problem formulation
Content-based recommendations
Collaborative filtering
Mean normalization

SLIDE 24

Problem motivation

Movie Alice (1) Bob (2) Carol (3) Dave (4) 𝑦1 (romance) 𝑦2 (action) Love at last

5 5 0.9

Romance forever

5 ? ? 1.0 0.01

Cute puppies

f love

? 4 ? 0.99

Nonstop car chases

5 4 0.1 1.0

Swords vs. karate

5 ? 0.9

SLIDE 25

Problem motivation

Movie Alice (1) Bob (2) Carol (3) Dave (4) 𝑦1 (romance) 𝑦2 (action) Love at last

5 5 ? ?

Romance forever

5 ? ? ? ?

Cute puppies

f love

? 4 ? ? ?

Nonstop car chases

5 4 ? ?

Swords vs. karate

5 ? ? ?

𝜄 1 = 5 𝜄 2 = 5 𝜄 3 = 5 𝜄 4 = 5 𝑦 1 = ? ? ?

SLIDE 26

Optimization algorithm

Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜𝑣 , to learn 𝑦(𝑗):

min

𝑦(𝑗)

1 2 ෍

𝑘:𝑠 𝑗,𝑘 =1

(𝜄 𝑘 )⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘

2 + 𝜇

2 ෍

𝑙=1 𝑜

𝑦𝑙

(𝑗) 2

Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜𝑣 , to learn 𝑦(1), 𝑦(2), ⋯ , 𝑦(𝑜𝑛):

min

𝑦(1),𝑦(2),⋯,𝑦(𝑜𝑛)

1 2 ෍

𝑗=1 𝑜𝑛

෍

𝑘:𝑠 𝑗,𝑘 =1

(𝜄 𝑘 )⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘

2 + 𝜇

2 ෍

𝑗=1 𝑜𝑛

෍

𝑙=1 𝑜

𝑦𝑙

(𝑗) 2

SLIDE 27

Collaborative filtering

Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜𝑛 (and movie ratings),

Can estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜𝑣

Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜𝑣

Can estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜𝑛

SLIDE 28

Collaborative filtering optimization

bjective
Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜𝑛 , estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜𝑣

min

𝜄 1 ,𝜄 2 ,⋯,𝜄 𝑜𝑣

1 2 ෍

𝑘=1 𝑜𝑣

෍

𝑗:𝑠 𝑗,𝑘 =1

(𝜄 𝑘 )⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘

2 + 𝜇

2 ෍

𝑘=1 𝑜𝑣

෍

𝑙=1 𝑜

𝜄𝑙

𝑘 2

Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜𝑣 , estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜𝑛

min

𝑦(1),𝑦(2),⋯,𝑦(𝑜𝑛)

1 2 ෍

𝑗=1 𝑜𝑛

෍

𝑘:𝑠 𝑗,𝑘 =1

(𝜄 𝑘 )⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘

2 + 𝜇

2 ෍

𝑗=1 𝑜𝑛

෍

𝑙=1 𝑜

𝑦𝑙

(𝑗) 2

SLIDE 29

Collaborative filtering optimization objective

Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜𝑛 , estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜𝑣

min

𝜄 1 ,𝜄 2 ,⋯,𝜄 𝑜𝑣

1 2 ෍

𝑘=1 𝑜𝑣

෍

𝑗:𝑠 𝑗,𝑘 =1

(𝜄 𝑘 )⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘

2 + 𝜇

2 ෍

𝑘=1 𝑜𝑣

෍

𝑙=1 𝑜

𝜄𝑙

𝑘 2

Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜𝑣 , estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜𝑛

min

𝑦(1),𝑦(2),⋯,𝑦(𝑜𝑛)

1 2 ෍

𝑗=1 𝑜𝑛

෍

𝑘:𝑠 𝑗,𝑘 =1

(𝜄 𝑘 )⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘

2 + 𝜇

2 ෍

𝑗=1 𝑜𝑛

෍

𝑙=1 𝑜

𝑦𝑙

(𝑗) 2

Minimize 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜𝑛 and 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜𝑣 simultaneously

𝐾 = 1 2 ෍

𝑘:𝑠 𝑗,𝑘 =1

(𝜄 𝑘 )⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘

2 + 𝜇

2 ෍

𝑘=1 𝑜𝑣

෍

𝑙=1 𝑜

𝜄𝑙

𝑘 2

+ 𝜇 2 ෍

𝑗=1 𝑜𝑛

෍

𝑙=1 𝑜

𝑦𝑙

(𝑗) 2

SLIDE 30

Collaborative filtering optimization objective

𝐾(𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜𝑛 , 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜𝑣 ) = 1 2 ෍

𝑠 𝑗,𝑘 =1

(𝜄 𝑘 )⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘

2 + 𝜇

2 ෍

𝑘=1 𝑜𝑣

෍

𝑙=1 𝑜

𝜄𝑙

𝑘 2

+ 𝜇 2 ෍

𝑗=1 𝑜𝑛

෍

𝑙=1 𝑜

𝑦𝑙

(𝑗) 2

SLIDE 31

Collaborative filtering algorithm

Initialize 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜𝑛 , 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜𝑣 to small random values
Minimize 𝐾(𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜𝑛 , 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜𝑣 ) using gradient

descent (or an advanced optimization algorithm). For every 𝑘 = 1 ⋯ 𝑜𝑣, 𝑗 = 1, ⋯ , 𝑜𝑛: 𝑦𝑙

𝑘 ≔ 𝑦𝑙 𝑘 − 𝛽

෍

𝑘:𝑠 𝑗,𝑘 =1

( 𝜄 𝑘

⊤ 𝑦 𝑗

− 𝑧 𝑗,𝑘 ) 𝜄𝑙

𝑗 + 𝜇 𝑦𝑙 (𝑗)

𝜄𝑙

𝑘 ≔ 𝜄𝑙 𝑘 − 𝛽

෍

𝑗:𝑠 𝑗,𝑘 =1

( 𝜄 𝑘

⊤ 𝑦 𝑗

− 𝑧 𝑗,𝑘 ) 𝑦𝑙

𝑗 + 𝜇 𝜄𝑙 (𝑘)

For a user with parameter 𝜄 and movie with (learned) feature 𝑦, predict

a star rating of 𝜄⊤𝑦

SLIDE 32

Collaborative filtering

Movie Alice (1) Bob (2) Carol (3) Dave (4) Love at last

5 5

Romance forever

5 ? ?

Cute puppies of love

? 4 ?

Nonstop car chases

5 4

Swords vs. karate

5 ?

SLIDE 33

Collaborative filtering

Predicted ratings:

𝑌 = − 𝑦 1

⊤ −

− 𝑦 2

⊤ −

⋮ − 𝑦 𝑜𝑛

⊤ −

Θ = − 𝜄 1

⊤−

− 𝜄 2

⊤−

⋮ − 𝜄 𝑜𝑣

⊤ −

Y = XΘ⊤

Low-rank matrix factorization

SLIDE 34

Finding related movies/products

For each product 𝑗, we learn a feature vector 𝑦(𝑗) ∈ 𝑆𝑜

𝑦1: romance, 𝑦2: action, 𝑦3: comedy, …

How to find movie 𝑘 relate to movie 𝑗?

Small 𝑦(𝑗) − 𝑦(𝑘) movie j and I are “similar”

SLIDE 35

Recommender Systems

Motivation
Problem formulation
Content-based recommendations
Collaborative filtering
Mean normalization

SLIDE 36

Users who have not rated any movies

1 2 ෍

𝑠 𝑗,𝑘 =1

(𝜄 𝑘 )⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘

2 + 𝜇

2 ෍

𝑘=1 𝑜𝑣

෍

𝑙=1 𝑜

𝜄𝑙

𝑘 2

+ 𝜇 2 ෍

𝑗=1 𝑜𝑛

෍

𝑙=1 𝑜

𝑦𝑙

(𝑗) 2

𝜄(5) = 0

Movie Alice (1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last

5 5 ?

Romance forever

5 ? ? ?

Cute puppies

f love

? 4 ? ?

Nonstop car chases

5 4 ?

Swords vs. karate

5 ? ?

SLIDE 37

Users who have not rated any movies

1 2 ෍

𝑠 𝑗,𝑘 =1

(𝜄 𝑘 )⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘

2 + 𝜇

2 ෍

𝑘=1 𝑜𝑣

෍

𝑙=1 𝑜

𝜄𝑙

𝑘 2

+ 𝜇 2 ෍

𝑗=1 𝑜𝑛

෍

𝑙=1 𝑜

𝑦𝑙

(𝑗) 2

𝜄(5) = 0

Movie Alice (1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last

5 5

Romance forever

5 ? ?

Cute puppies

f love

? 4 ?

Nonstop car chases

5 4

Swords vs. karate

5 ?

SLIDE 38

Mean normalization

For user 𝑘, on movie 𝑗 predict: 𝜄 𝑘

⊤ 𝑦(𝑗) + 𝜈𝑗

User 5 (Eve): 𝜄 5 = 0 𝜄 5

⊤ 𝑦(𝑗) + 𝜈𝑗

Learn 𝜄(𝑘), 𝑦(𝑗)

SLIDE 39

Recommender Systems

Motivation
Problem formulation
Content-based recommendations
Collaborative filtering
Mean normalization

Recommender Systems Jia-Bin Huang Virginia Tech Spring 2019 - - PowerPoint PPT Presentation

Recommender Systems

Administrative

Unsupervised Learning

Motivating example: Monitoring machines in a data center

Multivariate Gaussian (normal) distribution

Multivariate Gaussian (normal) examples

Multivariate Gaussian (normal) examples

Multivariate Gaussian (normal) examples

Anomaly detection using the multivariate Gaussian distribution

Recommender Systems

Recommender Systems

You may also like..?

Recommender Systems

Example: Predicting movie ratings

Recommender Systems

Content-based recommender systems

Content-based recommender systems

Problem formulation

Optimization objective

Optimization algorithm

Recommender Systems

Problem motivation

Problem motivation

Optimization algorithm

Collaborative filtering

Collaborative filtering optimization

Collaborative filtering optimization objective

Collaborative filtering optimization objective

Collaborative filtering algorithm

Collaborative filtering

Collaborative filtering

Y = XΘ⊤

Finding related movies/products

Recommender Systems

Users who have not rated any movies

Users who have not rated any movies

Mean normalization

Recommender Systems