online ranking combination
play

Online Ranking Combination Erzs ebet Frig o Institute for - PowerPoint PPT Presentation

Online Ranking Combination Erzs ebet Frig o Institute for Computer Science and Control (MTA SZTAKI) Joint work with Levente Kocsis Overview Framework: prequential ranking evaluation Goal: optimize convex combination of ranking


  1. Online Ranking Combination Erzs´ ebet Frig´ o Institute for Computer Science and Control (MTA SZTAKI) Joint work with Levente Kocsis

  2. Overview ◮ Framework: prequential ranking evaluation ◮ Goal: optimize convex combination of ranking models ◮ Our proposal: direct optimization of the ranking function

  3. Model combination in prequential framework with ranking evaluation time A 1 A 2 A 3 u i 1 i 1 i 1 i 5 i 2 i 2 i 2 combine i 2 scores ranking list ... ... ... i 7 i m-1 i m-1 i m-1 i 3 i m i m i m

  4. Model combination in prequential framework with ranking evaluation time A 1 A 2 A 3 u i 1 i 1 i 1 i 5 i 2 i 2 i 2 combine i 2 scores ranking list ... ... ... i 7 i m-1 i m-1 i m-1 i 3 i m i m i m Objective: choosing combination weights.

  5. New idea: optimize ranking function directly ◮ Standard method: take a surrogate function and use its gradient ◮ E.g. MSE ◮ Drawback: optimum of the surrogate � = optimum of ranking function ◮ Proposed solution: optimize the ranking function directly ◮ Two approaches: ◮ Global search in the weight space ◮ Gradient approximation (finite differences)

  6. ExpW ◮ Choose a subset Q of the weight space Θ ◮ e.g., lay a grid to the parameter space ◮ Apply exponentially weighted forecaster on Q � t − 1 τ =1 (1 − r τ ( q )) e − η t P (select q ∈ Q in round t ) = � t − 1 τ =1 (1 − r τ ( s )) � s ∈ Q e − η t ◮ Theoretical guarantee: √ E [ R T (best static combination in Θ) − R T ( ExpW )] < O ( T ) ◮ if the cumulative reward function ( R T ) is sufficiently smooth ◮ and Q is sufficiently large ◮ Difficulty: size of Q is exponential in number of base rankers, can’t scale

  7. Simultaneous Perturbation Stochastic Approximation (SPSA) ◮ Approximated gradient (for the weight of base ranker i in round t ): g ti = r t ( θ t + c t ∆ t ) − r t ( θ t − c t ∆ t ) c t ∆ ti ◮ θ t is the current combination weight vector ◮ ∆ t = (∆ t 1 , ... ) is a random vector of +/-1 ◮ c t is perturbation step size ◮ Online update step: one gradient step using the approximated gradient

  8. RSPSA ◮ RSPSA = SPSA + Resilient Backpropagation (RProp)

  9. RSPSA ◮ RSPSA = SPSA + Resilient Backpropagation (RProp) ◮ RProp defines gradient step sizes for each weight ◮ Perturbation step size is tied to gradient step size ◮ Update step sizes using RProp

  10. Resilient Backpropagation (RProp) ◮ Gradient update rule ◮ Predefined step size for each coordinate ◮ ignores the length of the gradient vector ◮ Step size is updated based on the sign of the gradient ◮ decrease step if gradient changed direction ◮ increase otherwise

  11. g ti = r t ( θ t + c t ∆ t ) − r t ( θ t − c t ∆ t ) c t ∆ ti

  12. RFDSA+ ◮ Switch to finite differences (FD) ◮ allows to detect 0 gradient w.r.t. one coordinate ◮ If the gradient is 0 w.r.t. a coordinate, then ◮ increase perturbation size (+) for that coordinate ◮ escape flat section in the right direction ◮ RFDSA+ = RSPSA - simultaneous perturbation + finite differences + zero gradient detection ◮ The modifications might seem to be minor, but are essential to make the algorithm work

  13. Experiments - Datasets, base rankers ◮ Base rankers: ◮ 5 datasets ◮ Models updated incrementally ◮ Amazon SGD Matrix Factorization ◮ CDs and Vinyl ◮ Movies and TV Asymmetric Matrix Factorization ◮ Electronics ◮ MovieLens 10M Item-to-item similarity ◮ Twitter ◮ hashtag prediction 25% 30% 15% 25% 5% Most popular ◮ Size ◮ Traditional models updated periodically ◮ # of events: 2M-10M ◮ # of users: 70k-4M SGD Matrix Factorization ◮ # of items: 10k-100k Implicit Alternating Least Squares MF

  14. Combination algorithms in the experiments Direct optimization: Baselines: ◮ ExpW ◮ ExpA ◮ exponentially weighted forecaster on ◮ exponentially weighted forecaster on a grid the base rankers ◮ global optimization ◮ ExpAW ◮ SPSA ◮ use probabilities of ExpA as weights ◮ gradient method with simultaneous ◮ SGD perturbation ◮ use MSE as a surrogate ◮ RSPSA ◮ target=1 for positive sample ◮ SPSA with RProp ◮ target=0 for generated negative samples ◮ RFDSA+ ◮ our new algorithm ◮ finite differences, flat section detection

  15. Results - 2 base rankers (i2i, OMF) - nDCG 0.07 0.06 0.05 0.04 NDCG item2item 0.03 OMF ExpA ExpAW 0.02 ExpW SGD SPSA 0.01 RSPSA RFDSA+ 0 0 1000 2000 3000 4000 5000 6000 7000 days

  16. Results - 2 base rankers - Combination weights 1.0000000 0.1000000 0.0100000 θ 0.0010000 0.0001000 OptG100+ ExpAW SGD 0.0000100 SPSA RSPSA RFDSA+ 0.0000010 0 1000 2000 3000 4000 5000 6000 7000 days

  17. Cumulative reward as function of combination weight 0.054 R T ( θ ) 0.052 0.05 0.048 NDCG 0.046 0.044 0.042 0.04 0.038 0.0001 0.001 0.01 0.1 1 θ

  18. Results - Scalability 0.06 0.055 0.05 NDCG 0.045 0.04 0.035 ExpA SPSA ExpAW RSPSA SGD RFDSA+ 0.03 1 2 3 4 5 6 7 8 9 10 number of OMF’s

  19. Results - 6 base rankers - DCG

  20. Conclusions ◮ Problem: combine ranking algorithms ◮ Our proposal: optimize the ranking measure directly ◮ Global optimization (ExpW) works well in case of two base algo ◮ Our new algo: RFDSA+ ◮ solves problems (scaling, constant sections w.r.t one coordinate) ◮ strong combination

  21. The End Online Ranking Combination Erzs´ ebet Frig´ o Institute for Computer Science and Control (MTA SZTAKI) Joint work with Levente Kocsis

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend