Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ - PowerPoint PPT Presentation

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ICTIR 2017 http://learningtorank.isti.cnr.it/ Claudio Lucchese Franco Maria Nardini Ca’ Foscari University of Venice HPC Lab, ISTI-CNR Venice, Italy Pisa, Italy l a b o r a t o r y

The Ranking Problem Ranking is at the core of several IR Tasks: • Document Ranking in Web Search • Ads Ranking in Web Advertising • Query suggestion & completion • Product Recommendation • Song Recommendation • … Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 2

The Ranking Problem Definition: Given a query q and a set of objects/documents D , to rank D so as to maximize users’ satisfaction Q . Goal #1: Effectiveness Goal #2: Efficiency • Maximize Q ! • Make sure the ranking process is • but how to measure Q ? feasible and not too expensive • In Bing ... “every 100msec improves revenue by 0.6%. Every millisecond counts.” [KDF+13] [KDF+13] Kohavi, R., Deng, A., Frasca, B., Walker, T., Xu, Y., & Pohlmann, N. (2013, August). Online controlled experiments at large scale . In Proceedings of the 19th ACM SIGKDD international Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 3 conference on Knowledge discovery and data mining (pp. 1168-1176). ACM.

Agenda 1. Introduction to Learning to Rank (LtR) Background, algorithms, sources of cost in LtR, multi-stage ranking • 2. Dealing with the Efficiency/Effectiveness trade-off Feature Selection, Enhanced Learning, Approximate scoring, Fast Scoring • 3. Hands-on I • Software, data and publicly available tools • Traversing Regression Forests, SoA tools and analysis 4. Hands-on II Training models, Pruning strategies, Efficient scoring • At the end of the day you’ll be able to train a high quality ranking model, and to exploit SoA tools and techniques to reduce its computational cost up to 18x ! Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 4

Document Representations and Ranking Document Representations Ranking Functions A document is a multi-set of words Term-weighting [SJ72] Vector Space Model [SB88] A document may have fields, it can be split into zones, it can be enriched with external text data BM25 [JWR00] , BM25f [ RZT04 ] (e.g., anchors) Language Modeling [PC98] Additional information may be useful, such as In- Links, Out-Links, PageRank, # clicks, social links, etc. Linear Combination of features [MC07] Hundred signals in public LtR Datasets How to combine hundreds of signals? [SJ72] Karen Sparck Jones. A statistical interpretation of term specificity and its application in retrieval . Journal of documentation, 28(1):11–21, 1972. [SB88] Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval . Information processing & management, 24(5):513–523, 1988. [JWR00] K Sparck Jones, Steve Walker, and Stephen E. Robertson . A probabilistic model of information retrieval: development and comparative experiments . Information processing & management, 36(6):809–840, 2000 [RZT04] Stephen Robertson, Hugo Zaragoza, and Michael Taylor. Simple bm25 extension to multiple weighted fields . In Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 42–49. ACM, 2004. [PC98] Jay M Ponte and W Bruce Croft. A language modeling approach to information retrieval . In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275–281. ACM, 1998. [MC07] Donald Metzler and W Bruce Croft . Linear feature-based models for information retrieval . Information Retrieval, 10(3):257–274, 2007. Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 5

Ranking as a Supervised Learning Task Training Instance … … d i d 1 d 2 d 3 q … … y i y 1 y 2 y 3 Machine Learning Algo (NeuralNet, SVM, Decision-Tree) Loss Function Ranking Model Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 6

Relevance Labels Query/Document Generation Representation Useful signals • Explicit Feedback q y • Link Analysis [H+00] • Thousands of Search Quality d • Term proximity [RS03] Raters • Query classification [BSD10] • Absolute vs. Relative • Query intent mining [JLN16, LOP+13] Judgments [CBCD08] • Finding entities documents [MW08] • Implicit Feedback and in queries [BOM15] • Document recency [DZK+10] • clicks/query chains [JGP+05, Joa02, RJ05] • Distributed representations of • De-biasing/click models [JSS17] words and their compositionality • Minimize annotation cost [MSC+13] • Convolutional neural networks • Active Learning [LCZ+10] [SHG+14] • Deep versus Shallow labelling [YR09] • …. Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 7

Evaluation Measures for Ranking X Gain ( d r ) · Discount ( r ) Q @ k = Many are in the form: ranks r =1 ...k 1 • (N)DCG [JK00] : Gain ( d ) = 2 y − 1 Discount ( r ) = log( r + 1) • RBP [MZ08] : Discount ( r ) = (1 − p ) p r − 1 Gain ( d ) = I ( y ) • ERR [CMZG09] : i − 1 Y (1 − R j ) with R i = (2 y − 1) / 2 y max Gain ( d ) = R i Discount ( r ) = 1 /r j =1 Do they match User satisfaction ? • ERR correlates better with user satisfaction (clicks and editorials) [CMZG09] • Results Interleaving to compare two rankings [CJRY12] • “major revisions of the web search rankers [Bing] ... The differences between these rankers involve changes of over half a percentage point , in absolute terms, of NDCG” [JK00] Kalervo J arvelin and Jaana Kekalainen. Ir evaluation methods for retrieving highly relevant documents . In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 41–48. ACM, 2000. [MZ08] Alistair Moffat and Justin Zobel. Rank-biased precision for measurement of retrieval effectiveness . ACM Transactions on Information Systems (TOIS), 27(1):2, 2008. [CMZG09] Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. Expected reciprocal rank for graded relevance . In Proceedings of the 18th ACM conference on Information and knowledge management, pages 621–630. ACM, 2009. [CJRY12] Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. Large-scale validation and analysis of interleaved search evaluation . ACM Transactions on Information Systems (TOIS), 30(1):6, 2012. Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 8

Is it an easy or difficult task? d 2 NDCG@k d 3 Gradient descent cannot be applied directly d 1 Rank-based measures (NDCG, ERR, MAP, …) d 0 depend on documents sorted order d i document score gradient is either 0 (sorted order did not change) • (model parameters) or undefined (discontinuity) Proxy Quality Function d 2 Solution: we need a proxy Loss function d 3 • it should be differentiable • and with a similar behavior of the original cost function d 1 d 0 d i document score (model parameters) Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 9

Point-Wise Algorithms Training Instance Each document is considered d i y i independently from the others • No information about other candidates for the same query is used at training time A different cost-function is optimized Training Algo: GB GBRT • Several approaches: Regression, Multi-Class Loss Function: MS MSE Classification, Ordinal regression, … [Liu11] Among Regression-Based: Gradient Boosting Regression Trees [Fri01] • Mean Squared Error is minimized … [Liu11] Tie-Yan Liu. Learning to rank for information retrieval , 2011. Springer. Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 10 [Fri01] Jerome H Friedman. Greedy function approximation: a gradient boosting machine . Annals of statistics, pages 1189–1232, 2001.

Gradient Boosting Regression Trees Error Y-F(d) Iterative algorithm : X y F ( d ) = f i ( d ) Weak t 3 Learner i f 3 (d) predicted document score Each f i is regarded as a step in the best optimization direction, i.e., a steepest descent step : y-f 1 (d) negative gradient t 2 by line-search  ∂ L ( y, f ( d )) � f 2 (d) f i ( d ) = − ρ i g i ( d ) − g i ( d ) = − ∂ f ( d ) f = P j<i f j Given L = MSE/2 : t 1 pseudo-response ⇥ 1 ⇥ 1 ⇤ P ( y − f ( d )) 2 ⇤ f 1 (d) 2 MSE ( y, f ( d )) − ∂ = − ∂ 2 = y − f ( d ) ∂ f ( d ) ∂ f ( d ) Gradient g i is approximated by a Regression Tree t i d

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ - PowerPoint PPT Presentation

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ICTIR 2017 http://learningtorank.isti.cnr.it/ Claudio Lucchese Franco Maria Nardini Ca Foscari University of Venice HPC Lab, ISTI-CNR Venice, Italy Pisa, Italy l a b

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ECML PKDD 2018

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ECML PKDD 2018

TRADE-OFFS AMONG AI TRADE-OFFS AMONG AI TECHNIQUES TECHNIQUES Christian Kaestner With slides

Time-memory Trade-offs for Near-collisions Conclusion Combining trunc & codes Time-memory

Chapter 2 Trade-offs, Comparative Advantage, and the Market System Modeling Trade-offs:

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Efficiency and Distributional Trade-Offs in Recycling Carbon Cap-and-Trade Revenues Ian Parry

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

History of Operating Systems What drives these trade-offs? Hardware User Applications

Performance, Information Pattern Trade-offs and Computational Complexity Analysis of a Consensus

PubPol 201 Module 3: International Trade Policy Class 1 Introduction to Trade and Trade Policy

PubPol 201 Module 3: International Trade Policy Class 1 Introduction to Trade and Trade Policy

Oklahoma State Department of Health Health Efficiency & Effectiveness Workgroup Meeting

Main Memory Management organizing memory hardware Swapping (which are pertinent to

INF5110 Compiler Construction Run-time environments Spring 2016 1 / 92 Outline 1. Run-time

Garside structure and Dehornoy ordering of braid groups for topologist (mini-course I) Tetsuya

FairCloud: Sharing the Network in Cloud Computing Lucian Popa Gautam Kumar Mosharaf Chowdhury

Sorting Algorithms October 18, 2017 CMPE 250 Sorting Algorithms October 18, 2017 1 / 74

Unified memory GPGPU 2015: High Performance Computing with CUDA University of Cape Town (South

L e f t i s t h e a p s ( W e i s s 6 . 6 ) H e a p s w i t h m e

Cyber-Physical Systems Scheduling IECE 553/453 Fall 2019 Prof. Dola Saha 1 Scheduler A

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ - PowerPoint PPT Presentation

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ICTIR 2017 http://learningtorank.isti.cnr.it/ Claudio Lucchese Franco Maria Nardini Ca Foscari University of Venice HPC Lab, ISTI-CNR Venice, Italy Pisa, Italy l a b

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ECML PKDD 2018

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ECML PKDD 2018

TRADE-OFFS AMONG AI TRADE-OFFS AMONG AI TECHNIQUES TECHNIQUES Christian Kaestner With slides

Time-memory Trade-offs for Near-collisions Conclusion Combining trunc &amp; codes Time-memory

Chapter 2 Trade-offs, Comparative Advantage, and the Market System Modeling Trade-offs:

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Efficiency and Distributional Trade-Offs in Recycling Carbon Cap-and-Trade Revenues Ian Parry

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

History of Operating Systems What drives these trade-offs? Hardware User Applications

Performance, Information Pattern Trade-offs and Computational Complexity Analysis of a Consensus

PubPol 201 Module 3: International Trade Policy Class 1 Introduction to Trade and Trade Policy

PubPol 201 Module 3: International Trade Policy Class 1 Introduction to Trade and Trade Policy

Oklahoma State Department of Health Health Efficiency &amp; Effectiveness Workgroup Meeting

Main Memory Management organizing memory hardware Swapping (which are pertinent to

INF5110 Compiler Construction Run-time environments Spring 2016 1 / 92 Outline 1. Run-time

Garside structure and Dehornoy ordering of braid groups for topologist (mini-course I) Tetsuya

FairCloud: Sharing the Network in Cloud Computing Lucian Popa Gautam Kumar Mosharaf Chowdhury

Sorting Algorithms October 18, 2017 CMPE 250 Sorting Algorithms October 18, 2017 1 / 74

Unified memory GPGPU 2015: High Performance Computing with CUDA University of Cape Town (South

L e f t i s t h e a p s ( W e i s s 6 . 6 ) H e a p s w i t h m e

Cyber-Physical Systems Scheduling IECE 553/453 Fall 2019 Prof. Dola Saha 1 Scheduler A

Time-memory Trade-offs for Near-collisions Conclusion Combining trunc & codes Time-memory

Oklahoma State Department of Health Health Efficiency & Effectiveness Workgroup Meeting