Data Mining Techniques
CS 6220 - Section 3 - Fall 2016
Lecture 14
Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 14 - - PowerPoint PPT Presentation
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 14 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246) Recommender Systems The Long Tail (from: https://www.wired.com/2004/10/tail/) The
CS 6220 - Section 3 - Fall 2016
Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)
(from: https://www.wired.com/2004/10/tail/)
(from: https://www.wired.com/2004/10/tail/)
(from: https://www.wired.com/2004/10/tail/)
Geared towards females Geared towards males serious escapist The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s ¡11 Sense and Sensibility
Gus Dave
Geared towards females Geared towards males serious escapist The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s ¡11 Sense and Sensibility
Gus Dave
Idea: Predict rating using item features on a per-user basis
Geared towards females Geared towards males serious escapist The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s ¡11 Sense and Sensibility
Gus Dave
Idea: Predict rating using user features on a per-item basis
Joe
#2 #3 #1 #4
Idea: Predict rating based on similarity to other users
score
date movie user
1 5/7/02 21 1 5 8/2/04 213 1 4 3/6/01 345 2 4 5/1/05 123 2 3 7/15/02 768 2 5 1/22/01 76 3 4 8/3/00 45 4 1 9/10/05 568 5 2 3/5/03 342 5 2 12/28/00 234 5 5 8/11/02 76 6 4 6/15/03 56 6
score date movie user
? 1/6/05 62 1 ? 9/13/04 96 1 ? 8/18/05 7 2 ? 11/22/05 3 2 ? 6/13/02 47 3 ? 8/12/01 15 3 ? 9/1/00 41 4 ? 8/27/05 28 4 ? 4/4/05 93 5 ? 7/16/03 74 5 ? 2/14/04 69 6 ? 10/3/03 83 6
Training data Test data
(i,u)∈S
(i,u)∈S
(doesn’t tell you how to actually do recommendation)
Netflix then Netflix now
wu = argmin
w
|ru − X w|2
Learn a set of regression coefficients for each user
Moonrise Kingdom 4 5 4 4 0.3 0.2
Moonrise Kingdom 4 5 4 4 0.3 0.2
Problem: Some movies are universally loved / hated
Moonrise Kingdom 4 5 4 4 0.3 0.2
Problem: Some movies are universally loved / hated some users are more picky than others
3 3 3
Solution: Introduce a per-movie and per-user bias Problem: Some movies are universally loved / hated some users are more picky than others
Moonrise Kingdom 4 5 4 4 0.3 0.2
…
2004
Netflix changed rating labels
Are movies getting better with time?
Solution: Model temporal effects in bias not weights
Are movies getting better with time?
Joe
#2 #3 #1 #4
Users and items form a bipartite graph (edges are ratings)
(user, user) similarity
from k-nearest users
(item,item) similarity
from k-nearest items
ˆ rui = bui + P
j∈sk(i,u) sij(ruj − buj)
P
j∈sk(i,u) sij
where bui = µ + bu + bi
to i that were rated by user u
each item rated by a distinct set of users
1 ? ? 5 5 3 ? ? ? 4 2 ? ? ? ? 4 ? 5 4 1 ? ? ? 4 2 5 ? ? 1 2 5 ? ? 2 ? ? 3 ? ? ? 5 4 User ratings for item i: User ratings for item j:
ˆ ρij = P
u∈U(i,j)(rui − bui)(ruj − buj)
qP
u∈U(i,j)(rui − bui)2 P u∈U(i,j)(ruj − buj)2
Empirical estimate of Pearson correlation coefficient sij = |U(i, j)| − 1 |U(i, j)| − 1 + λ ˆ ρij Regularize towards 0 for small support Regularize towards baseline for small neighborhood ˆ rui = bui + P
j∈sk(i,u) sij(ruj − buj)
λ + P
j∈sk(i,u) sij
mi users acting on i mij users acting on both i and j m total number of users sij = mij α + mi + mj − mij sij = observed expected ≈ mij α + mimj/m Jaccard similarity Observed / Expected ratio Pearson correlation not meaningful for binary labels (e.g. Views, Purchases, Clicks)
Moonrise Kingdom 4 5 4 4 0.3 0.2
Moonrise Kingdom 4 5 4 4 0.3 0.2
Idea: pose as (biased) matrix factorization problem
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1
items
.2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
items users users A rank-3 SVD approximation
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1
items
.2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
items users A rank-3 SVD approximation users
?
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1
items
.2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
items users
2.4
A rank-3 SVD approximation users
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1 .2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
–
Pose as regression problem Regularize using Frobenius norm
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1 .2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
–
(regress wu given X)
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1 .2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
–
(regress wu given X)
L2: closed form solution w = (XTX + λI)1XTy
Remember ridge regression?
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1 .2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
–
(regress xi given W) (regress wu given X)
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1 .2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
–
40 60 90 128 180 50 100 200 50 100 200 50 100 200 500 100 200 500 50 100 200 500 1000 1500
0.875 0.88 0.885 0.89 0.895 0.9 0.905 0.91
10 100 1000 10000 100000
RMSE
Millions of Parameters
Factor models: Error vs. #parameters
NMF BiasSVD SVD++ SVD v.2 SVD v.3 SVD v.4
Add biases
Do SGD, but also learn biases μ, bu and bi
Account for fact that ratings are not missing at random.
40 60 90 128 180 50 100 200 50 100 200 50 100 200 500 100 200 500 50 100 200 500 1000 1500
0.875 0.88 0.885 0.89 0.895 0.9 0.905 0.91
10 100 1000 10000 100000
RMSE
Millions of Parameters
Factor models: Error vs. #parameters
NMF BiasSVD SVD++ SVD v.2 SVD v.3 SVD v.4
“who ¡rated ¡ what”
40 60 90 128 180 50 100 200 50 100 200 50 100 200 500 100 200 500 50 100 200 500 1000 1500
0.875 0.88 0.885 0.89 0.895 0.9 0.905 0.91
10 100 1000 10000 100000
RMSE
Millions of Parameters
Factor models: Error vs. #parameters
NMF BiasSVD SVD++ SVD v.2 SVD v.3 SVD v.4
temporal effects
Account for drift in user and item biases
40 60 90 128 180 50 100 200 50 100 200 50 100 200 500 100 200 500 50 100 200 500 1000 1500
0.875 0.88 0.885 0.89 0.895 0.9 0.905 0.91
10 100 1000 10000 100000
RMSE
Millions of Parameters
Factor models: Error vs. #parameters
NMF BiasSVD SVD++ SVD v.2 SVD v.3 SVD v.4
temporal effects
Still pretty far from 0.8563 grand prize
June 26th submission triggers 30-day “last call”
June 26th submission triggers 30-day “last call”