efficiency effectiveness trade offs in learning to rank
play

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ - PowerPoint PPT Presentation

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ICTIR 2017 http://learningtorank.isti.cnr.it/ Claudio Lucchese Franco Maria Nardini Ca Foscari University of Venice HPC Lab, ISTI-CNR Venice, Italy Pisa, Italy l a b


  1. Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ICTIR 2017 http://learningtorank.isti.cnr.it/ Claudio Lucchese Franco Maria Nardini Ca’ Foscari University of Venice HPC Lab, ISTI-CNR Venice, Italy Pisa, Italy l a b o r a t o r y

  2. The Ranking Problem Ranking is at the core of several IR Tasks: • Document Ranking in Web Search • Ads Ranking in Web Advertising • Query suggestion & completion • Product Recommendation • Song Recommendation • … Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 2

  3. The Ranking Problem Definition: Given a query q and a set of objects/documents D , to rank D so as to maximize users’ satisfaction Q . Goal #1: Effectiveness Goal #2: Efficiency • Maximize Q ! • Make sure the ranking process is • but how to measure Q ? feasible and not too expensive • In Bing ... “every 100msec improves revenue by 0.6%. Every millisecond counts.” [KDF+13] [KDF+13] Kohavi, R., Deng, A., Frasca, B., Walker, T., Xu, Y., & Pohlmann, N. (2013, August). Online controlled experiments at large scale . In Proceedings of the 19th ACM SIGKDD international Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 3 conference on Knowledge discovery and data mining (pp. 1168-1176). ACM.

  4. Agenda 1. Introduction to Learning to Rank (LtR) Background, algorithms, sources of cost in LtR, multi-stage ranking • 2. Dealing with the Efficiency/Effectiveness trade-off Feature Selection, Enhanced Learning, Approximate scoring, Fast Scoring • 3. Hands-on I • Software, data and publicly available tools • Traversing Regression Forests, SoA tools and analysis 4. Hands-on II Training models, Pruning strategies, Efficient scoring • At the end of the day you’ll be able to train a high quality ranking model, and to exploit SoA tools and techniques to reduce its computational cost up to 18x ! Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 4

  5. Document Representations and Ranking Document Representations Ranking Functions A document is a multi-set of words Term-weighting [SJ72] Vector Space Model [SB88] A document may have fields, it can be split into zones, it can be enriched with external text data BM25 [JWR00] , BM25f [ RZT04 ] (e.g., anchors) Language Modeling [PC98] Additional information may be useful, such as In- Links, Out-Links, PageRank, # clicks, social links, etc. Linear Combination of features [MC07] Hundred signals in public LtR Datasets How to combine hundreds of signals? [SJ72] Karen Sparck Jones. A statistical interpretation of term specificity and its application in retrieval . Journal of documentation, 28(1):11–21, 1972. [SB88] Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval . Information processing & management, 24(5):513–523, 1988. [JWR00] K Sparck Jones, Steve Walker, and Stephen E. Robertson . A probabilistic model of information retrieval: development and comparative experiments . Information processing & management, 36(6):809–840, 2000 [RZT04] Stephen Robertson, Hugo Zaragoza, and Michael Taylor. Simple bm25 extension to multiple weighted fields . In Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 42–49. ACM, 2004. [PC98] Jay M Ponte and W Bruce Croft. A language modeling approach to information retrieval . In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275–281. ACM, 1998. [MC07] Donald Metzler and W Bruce Croft . Linear feature-based models for information retrieval . Information Retrieval, 10(3):257–274, 2007. Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 5

  6. Ranking as a Supervised Learning Task Training Instance … … d i d 1 d 2 d 3 q … … y i y 1 y 2 y 3 Machine Learning Algo (NeuralNet, SVM, Decision-Tree) Loss Function Ranking Model Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 6

  7. Relevance Labels Query/Document Generation Representation Useful signals • Explicit Feedback q y • Link Analysis [H+00] • Thousands of Search Quality d • Term proximity [RS03] Raters • Query classification [BSD10] • Absolute vs. Relative • Query intent mining [JLN16, LOP+13] Judgments [CBCD08] • Finding entities documents [MW08] • Implicit Feedback and in queries [BOM15] • Document recency [DZK+10] • clicks/query chains [JGP+05, Joa02, RJ05] • Distributed representations of • De-biasing/click models [JSS17] words and their compositionality • Minimize annotation cost [MSC+13] • Convolutional neural networks • Active Learning [LCZ+10] [SHG+14] • Deep versus Shallow labelling [YR09] • …. Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 7

  8. Evaluation Measures for Ranking X Gain ( d r ) · Discount ( r ) Q @ k = Many are in the form: ranks r =1 ...k 1 • (N)DCG [JK00] : Gain ( d ) = 2 y − 1 Discount ( r ) = log( r + 1) • RBP [MZ08] : Discount ( r ) = (1 − p ) p r − 1 Gain ( d ) = I ( y ) • ERR [CMZG09] : i − 1 Y (1 − R j ) with R i = (2 y − 1) / 2 y max Gain ( d ) = R i Discount ( r ) = 1 /r j =1 Do they match User satisfaction ? • ERR correlates better with user satisfaction (clicks and editorials) [CMZG09] • Results Interleaving to compare two rankings [CJRY12] • “major revisions of the web search rankers [Bing] ... The differences between these rankers involve changes of over half a percentage point , in absolute terms, of NDCG” [JK00] Kalervo J arvelin and Jaana Kekalainen. Ir evaluation methods for retrieving highly relevant documents . In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 41–48. ACM, 2000. [MZ08] Alistair Moffat and Justin Zobel. Rank-biased precision for measurement of retrieval effectiveness . ACM Transactions on Information Systems (TOIS), 27(1):2, 2008. [CMZG09] Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. Expected reciprocal rank for graded relevance . In Proceedings of the 18th ACM conference on Information and knowledge management, pages 621–630. ACM, 2009. [CJRY12] Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. Large-scale validation and analysis of interleaved search evaluation . ACM Transactions on Information Systems (TOIS), 30(1):6, 2012. Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 8

  9. Is it an easy or difficult task? d 2 NDCG@k d 3 Gradient descent cannot be applied directly d 1 Rank-based measures (NDCG, ERR, MAP, …) d 0 depend on documents sorted order d i document score gradient is either 0 (sorted order did not change) • (model parameters) or undefined (discontinuity) Proxy Quality Function d 2 Solution: we need a proxy Loss function d 3 • it should be differentiable • and with a similar behavior of the original cost function d 1 d 0 d i document score (model parameters) Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 9

  10. Point-Wise Algorithms Training Instance Each document is considered d i y i independently from the others • No information about other candidates for the same query is used at training time A different cost-function is optimized Training Algo: GB GBRT • Several approaches: Regression, Multi-Class Loss Function: MS MSE Classification, Ordinal regression, … [Liu11] Among Regression-Based: Gradient Boosting Regression Trees [Fri01] • Mean Squared Error is minimized … [Liu11] Tie-Yan Liu. Learning to rank for information retrieval , 2011. Springer. Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 10 [Fri01] Jerome H Friedman. Greedy function approximation: a gradient boosting machine . Annals of statistics, pages 1189–1232, 2001.

  11. Gradient Boosting Regression Trees Error Y-F(d) Iterative algorithm : X y F ( d ) = f i ( d ) Weak t 3 Learner i f 3 (d) predicted document score Each f i is regarded as a step in the best optimization direction, i.e., a steepest descent step : y-f 1 (d) negative gradient t 2 by line-search  ∂ L ( y, f ( d )) � f 2 (d) f i ( d ) = − ρ i g i ( d ) − g i ( d ) = − ∂ f ( d ) f = P j<i f j Given L = MSE/2 : t 1 pseudo-response ⇥ 1 ⇥ 1 ⇤ P ( y − f ( d )) 2 ⇤ f 1 (d) 2 MSE ( y, f ( d )) − ∂ = − ∂ 2 = y − f ( d ) ∂ f ( d ) ∂ f ( d ) Gradient g i is approximated by a Regression Tree t i d

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend