Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
1
CSE 6240: Web Search and Text Mining. Spring 2020
Recommendation Systems: Part II
- Prof. Srijan Kumar
Recommendation Systems: Part II Prof. Srijan Kumar - - PowerPoint PPT Presentation
CSE 6240: Web Search and Text Mining. Spring 2020 Recommendation Systems: Part II Prof. Srijan Kumar http://cc.gatech.edu/~srijan 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Announcements Project:
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
1
CSE 6240: Web Search and Text Mining. Spring 2020
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
2
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
3
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
4
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
5
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Geared towards females Geared towards males Serious Funny
6
The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus Dumb and Dumber Ocean’s 11 Sense and Sensibility
Factor 1 Factor 2
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
7
A
m n
m n
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
8
$,&,' ( 𝐵*+ − 𝑉Σ𝑊0 *+ 1
; <
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
9
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1 .2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
users items
items users
factors factors
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
10
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1
items
.2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
items users users
?
𝒔 >𝒚𝒋 = 𝒓𝒋 ⋅ 𝒒𝒚 = ( 𝒓𝒋𝒈 ⋅ 𝒒𝒚𝒈
qi = row i of Q px = column x of PT
factors
factors
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
11
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1
items
.2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
items users users
?
factors
factors
𝒔 >𝒚𝒋 = 𝒓𝒋 ⋅ 𝒒𝒚 = ( 𝒓𝒋𝒈 ⋅ 𝒒𝒚𝒈
qi = row i of Q px = column x of PT
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
12
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1
items
.2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
items users users
?
2.4 factors factors
𝒔 >𝒚𝒋 = 𝒓𝒋 ⋅ 𝒒𝒚 = ( 𝒓𝒋𝒈 ⋅ 𝒒𝒚𝒈
qi = row i of Q px = column x of PT
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Geared towards females Geared towards males Serious Funny
13
The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus Dumb and Dumber Ocean’s 11 Sense and Sensibility
Factor 1 Factor 2
Movies plotted in two dimensions. Dimensions have meaning.
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Geared towards females Geared towards males Serious Funny
14
The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus Dumb and Dumber Ocean’s 11 Sense and Sensibility
Factor 1 Factor 2
Users fall in the same space, showing their preferences.
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
15
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
16
i i x x training x i xi Q P
2 2 2 1 2 ,
l1, l2 … user set regularization parameters
“error” “length”
Note: We do not care about the “raw” value of the objective function, but we care in P,Q that achieve the minimum of the objective
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Geared towards females Geared towards males serious funny
17
The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s 11 Sense and Sensibility
Factor 1 Factor 2
The Princess Diaries
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
18
¡ μ = overall mean rating ¡ bx = bias of user x ¡ bi = bias of movie i
user-movie interaction movie bias user bias User-Movie interaction
¡
Characterizes the matching between users and movies
¡
Attracts most research in the field
¡
Benefits from algorithmic and mathematical innovations
Baseline predictor § Separates users and movies § Benefits from insights into user’s behavior § Among the main practical contributions of the competition
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
19
– Rating scale of user x – Values of other ratings user gave recently (day-specific mood, anchoring, multi-user accounts) – (Recent) popularity of movie i – Selection bias; related to number of ratings user gave on the same day (“frequency”)
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
20
Overall mean rating Bias for user x Bias for movie i
User-Movie interaction
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
21
regularization goodness of fit
l is selected via grid-search on a validation set
Î i i x x x x i i R i x x i i x xi P Q
2 4 2 3 2 2 2 1 2 ) , ( ,
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
22
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
23
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Grand Prize: 0.8563 Netflix: 0.9514 Movie average: 1.0533 User average: 1.0651 Global average: 1.1296
Basic Collaborative filtering: 0.94 Latent factors: 0.90 Latent factors + Biases: 0.89 Collaborative filtering++: 0.91
24
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
25
; L
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
26
Global effects Factorization Collaborative filtering
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
27
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
28
Global effects
Factorization
CF/NN
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
29
KDD ’09
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
30
with temporal dynamics, KDD ’09
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
31
0.875 0.88 0.885 0.89 0.895 0.9 0.905 0.91 0.915 0.92 1 10 100 1000 10000 RMSE Millions of parameters CF (no time bias) Basic Latent Factors CF (time bias) Latent Factors w/ Biases + Linear time factors + Per-day user biases + CF
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Grand Prize: 0.8563 Netflix: 0.9514 Movie average: 1.0533 User average: 1.0651 Global average: 1.1296
Basic Collaborative filtering: 0.94 Latent factors: 0.90 Latent factors + Biases: 0.89 Collaborative filtering++: 0.91
32
Latent factors+Biases+Time: 0.876
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
33 33
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
34
June 26th submission triggers 30-day “last call”
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
35
set of predictions
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
36
posts a score that is slightly better than BellKor’s
before deadline
37
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
38
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
39
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
40
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
41
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
42
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
43
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
44
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
45
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
46
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
47
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
48