SLIDE 1 Online Ranking Combination
Erzs´ ebet Frig´
- Institute for Computer Science and Control (MTA SZTAKI)
Joint work with Levente Kocsis
SLIDE 2
Overview
◮ Framework: prequential ranking evaluation ◮ Goal: optimize convex combination of ranking models ◮ Our proposal: direct optimization of the ranking function
SLIDE 3 Model combination in prequential framework with ranking evaluation
time
i2 i7 i3 u i5 i2 im-1 im i1 ... i2 im-1 im i1 ... i2 im-1 im i1 ...
A1 A2 A3
scores ranking list combine
SLIDE 4 Model combination in prequential framework with ranking evaluation
time
i2 i7 i3 u i5 i2 im-1 im i1 ... i2 im-1 im i1 ... i2 im-1 im i1 ...
A1 A2 A3
scores ranking list combine
Objective: choosing combination weights.
SLIDE 5
New idea: optimize ranking function directly
◮ Standard method: take a surrogate function and use its gradient
◮ E.g. MSE
◮ Drawback: optimum of the surrogate = optimum of ranking function ◮ Proposed solution: optimize the ranking function directly ◮ Two approaches:
◮ Global search in the weight space ◮ Gradient approximation (finite differences)
SLIDE 6 ExpW
◮ Choose a subset Q of the weight space Θ
◮ e.g., lay a grid to the parameter space
◮ Apply exponentially weighted forecaster on Q P(select q ∈ Q in round t) = e−ηt
t−1
τ=1(1−rτ(q))
t−1
τ=1(1−rτ(s))
◮ Theoretical guarantee: E[RT(best static combination in Θ) − RT(ExpW )] < O( √ T)
◮ if the cumulative reward function (RT) is sufficiently smooth ◮ and Q is sufficiently large
◮ Difficulty: size of Q is exponential in number of base rankers, can’t scale
SLIDE 7
Simultaneous Perturbation Stochastic Approximation (SPSA)
◮ Approximated gradient (for the weight of base ranker i in round t): gti = rt(θt + ct∆t) − rt(θt − ct∆t) ct∆ti
◮ θt is the current combination weight vector ◮ ∆t = (∆t1, ...) is a random vector of +/-1 ◮ ct is perturbation step size
◮ Online update step: one gradient step using the approximated gradient
SLIDE 8
SLIDE 9
SLIDE 10
SLIDE 11
RSPSA
◮ RSPSA = SPSA + Resilient Backpropagation (RProp)
SLIDE 12
RSPSA
◮ RSPSA = SPSA + Resilient Backpropagation (RProp) ◮ RProp defines gradient step sizes for each weight ◮ Perturbation step size is tied to gradient step size ◮ Update step sizes using RProp
SLIDE 13
Resilient Backpropagation (RProp)
◮ Gradient update rule ◮ Predefined step size for each coordinate
◮ ignores the length of the gradient vector
◮ Step size is updated based on the sign of the gradient
◮ decrease step if gradient changed direction ◮ increase otherwise
SLIDE 14
gti = rt(θt+ct∆t)−rt(θt−ct∆t)
ct∆ti
SLIDE 15
RFDSA+
◮ Switch to finite differences (FD)
◮ allows to detect 0 gradient w.r.t. one coordinate
◮ If the gradient is 0 w.r.t. a coordinate, then
◮ increase perturbation size (+) for that coordinate ◮ escape flat section in the right direction
◮ RFDSA+ = RSPSA - simultaneous perturbation + finite differences + zero gradient detection ◮ The modifications might seem to be minor, but are essential to make the algorithm work
SLIDE 16
Experiments - Datasets, base rankers
◮ 5 datasets
◮ Amazon
◮ CDs and Vinyl ◮ Movies and TV ◮ Electronics
◮ MovieLens 10M ◮ Twitter
◮ hashtag prediction
◮ Size
◮ # of events: 2M-10M ◮ # of users: 70k-4M ◮ # of items: 10k-100k
◮ Base rankers:
◮ Models updated incrementally SGD Matrix Factorization Asymmetric Matrix Factorization Item-to-item similarity
25% 25% 30% 15% 5%
Most popular ◮ Traditional models updated periodically SGD Matrix Factorization Implicit Alternating Least Squares MF
SLIDE 17
Combination algorithms in the experiments
Direct optimization: ◮ ExpW
◮ exponentially weighted forecaster on a grid ◮ global optimization
◮ SPSA
◮ gradient method with simultaneous perturbation
◮ RSPSA
◮ SPSA with RProp
◮ RFDSA+
◮ our new algorithm ◮ finite differences, flat section detection
Baselines: ◮ ExpA
◮ exponentially weighted forecaster on the base rankers
◮ ExpAW
◮ use probabilities of ExpA as weights
◮ SGD
◮ use MSE as a surrogate ◮ target=1 for positive sample ◮ target=0 for generated negative samples
SLIDE 18 Results - 2 base rankers (i2i, OMF) - nDCG
0.01 0.02 0.03 0.04 0.05 0.06 0.07 1000 2000 3000 4000 5000 6000 7000
NDCG days
item2item OMF ExpA ExpAW ExpW SGD SPSA RSPSA RFDSA+
SLIDE 19 Results - 2 base rankers - Combination weights
0.0000010 0.0000100 0.0001000 0.0010000 0.0100000 0.1000000 1.0000000 1000 2000 3000 4000 5000 6000 7000
θ days
OptG100+ ExpAW SGD SPSA RSPSA RFDSA+
SLIDE 20 Cumulative reward as function of combination weight
0.038 0.04 0.042 0.044 0.046 0.048 0.05 0.052 0.054 0.0001 0.001 0.01 0.1 1
NDCG θ
RT(θ)
SLIDE 21 Results - Scalability
0.03 0.035 0.04 0.045 0.05 0.055 0.06 1 2 3 4 5 6 7 8 9 10
NDCG number of OMF’s
ExpA ExpAW SGD SPSA RSPSA RFDSA+
SLIDE 22
Results - 6 base rankers - DCG
SLIDE 23
Conclusions
◮ Problem: combine ranking algorithms ◮ Our proposal: optimize the ranking measure directly ◮ Global optimization (ExpW) works well in case of two base algo ◮ Our new algo: RFDSA+
◮ solves problems (scaling, constant sections w.r.t one coordinate) ◮ strong combination
SLIDE 24 The End Online Ranking Combination
Erzs´ ebet Frig´
- Institute for Computer Science and Control (MTA SZTAKI)
Joint work with Levente Kocsis