learnin ning g maximal al marginal nal re releva vanc nce
play

Learnin ning g Maximal al Marginal nal Re Releva vanc nce e - PowerPoint PPT Presentation

Learnin ning g Maximal al Marginal nal Re Releva vanc nce e Model via Directly ctly Optim imizi izing ng Diver ersi sity ty Ev Evalua uatio tion Measur ures es Long Xia , Jun Xu, Yanyan Lan, Jiafeng Guo, Xueqi Cheng Key


  1. Learnin ning g Maximal al Marginal nal Re Releva vanc nce e Model via Directly ctly Optim imizi izing ng Diver ersi sity ty Ev Evalua uatio tion Measur ures es Long Xia , Jun Xu, Yanyan Lan, Jiafeng Guo, Xueqi Cheng Key Laboratory of Network Data Science and Technology Institute of Computing Technology Chinese Academy of Sciences

  2. Outline • Background • Rela late ted work rk • Our r appro roach • Experim iments • Summary ry 1

  3. 2

  4. Problem of diversity 3

  5. Outline • Background • Rela late ted work rk • Our r appro roach • Experim iments • Summary ry 4

  6. Related work  Heuristic approaches Maximal marginal relevance (MMR) • Heuristic criterion (Carbonell and Goldstein, approaches SIGIR’98) Select documents with high • divergence (Zhai et al., SIGIR’03) Minimize the risk of dissatisfaction of • Learning the average user (Agrawal et al., WSDM’09) approaches Diversity by proportionality: an • election-based approach (Dang and Croft, SIGIR’12 ) Diversity … • evaluation measures 5

  7. Related work  Heuristic approaches Maximal marginal relevance (MMR) Learning approaches  • Heuristic criterion (Carbonell and Goldstein, SVM-DIV: formulate the task as a • approaches SIGIR’98) problem of predicting diverse Select documents with high • subsets (Yue and Joachims, divergence (Zhai et al., SIGIR’03) ICML’04 ) Minimize the risk of dissatisfaction of • REC & RBA: online learning Learning • the average user (Agrawal et al., algorithms based on user’s clicking WSDM’09) approaches behavior (Radlinski et al., ICML’07) Diversity by proportionality: an • R-LTR: a process of sequential • election-based approach (Dang and document selection and optimizing Croft, SIGIR’12 ) Diversity the likelihood of ground truth … • rankings (Zhu et al., SIGIR’14) evaluation measures … • 5

  8. Related work  Heuristic approaches Maximal marginal relevance (MMR) Learning approaches  • Heuristic criterion (Carbonell and Goldstein, SVM-DIV: formulate the task as a • Diversity evaluation measures  approaches SIGIR’98) problem of predicting diverse Subtopic recall (Zhai et al., Select documents with high • • subsets (Yue and Joachims, divergence (Zhai et al., SIGIR’03) SIGIR’03) ICML’04 ) 𝛽 - NDCG (Clarke et al., SIGIR’08) Minimize the risk of dissatisfaction of • • REC & RBA: online learning Learning • the average user (Agrawal et al., ERR-IA (Chapella et al., CIKM’09) • algorithms based on user’s clicking WSDM’09) approaches NRBP (Clarke et al., ICTIT’09) • behavior (Radlinski et al., ICML’07) Diversity by proportionality: an • … • R-LTR: a process of sequential • election-based approach (Dang and document selection and optimizing Croft, SIGIR’12 ) Diversity the likelihood of ground truth … • rankings (Zhu et al., SIGIR’14) evaluation measures … • 5

  9. Maximal marginal relevance (Carbonell and Goldstein, SIGIR’98) 𝑁𝑁𝑆 ≝ 𝐵𝑠𝑕 max 𝐸 𝑗 ∈𝑆\S 𝜇 𝑇𝑗𝑛 1 𝐸 𝑗 , 𝑅 − 1 − 𝜇 max 𝐸 𝑗 ∈𝑇 𝑇𝑗𝑛 2 𝐸 𝑗 , 𝐸 𝑘 query-document similarity with relevance selected documents • Advantage • top-down user browsing behavior • Disadvantage • non-learning: limited number of ranking signals • High parameter tuning cost 6

  10. Relational Learning-to-Rank (Zhu et al., SIGIR’14) • Formalization • Four key components: input space, output space, ranking function f, loss function L 𝑂 𝑀 𝐠 𝑌 (𝑗) , 𝑆 (𝑗) , 𝐳 (𝑗) 𝐠 = 𝑏𝑠𝑕 min 𝐠∈ℱ 𝑗=1 7

  11. Relational Learning-to-Rank (Zhu et al., SIGIR’14) • Formalization • Four key components: input space, output space, ranking function f, loss function L 𝑂 𝑀 𝐠 𝑌 (𝑗) , 𝑆 (𝑗) , 𝐳 (𝑗) 𝐠 = 𝑏𝑠𝑕 min 𝐠∈ℱ 𝑗=1 • Definition of ranking function 𝑈 𝐲 𝑗 + 𝜕 𝑒 𝑈 ℎ 𝑇 𝑆 𝑗 , ∀𝐲 𝑗 ∈ 𝑌\S 𝑔 𝑇 𝐲 𝑗 , 𝑆 𝑗 = 𝜕 𝑠 relevance diversity score score 7

  12. Relational Learning-to-Rank (Zhu et al., SIGIR’14) • Formalization • Four key components: input space, output space, ranking function f, loss function L 𝑂 𝑀 𝐠 𝑌 (𝑗) , 𝑆 (𝑗) , 𝐳 (𝑗) 𝐠 = 𝑏𝑠𝑕 min 𝐠∈ℱ 𝑗=1 • Definition of ranking function 𝑈 𝐲 𝑗 + 𝜕 𝑒 𝑈 ℎ 𝑇 𝑆 𝑗 , ∀𝐲 𝑗 ∈ 𝑌\S 𝑔 𝑇 𝐲 𝑗 , 𝑆 𝑗 = 𝜕 𝑠 matrix of relationships relevance diversity Relational between document 𝑦 𝑗 and score score function other documents 7

  13. Relational Learning-to-Rank (Zhu et al., SIGIR’14) • Definition of loss function 𝑀 𝑔 𝑌, 𝑆 , 𝑧 = − log 𝑄 𝑧|𝑌 𝑄 𝑧|𝑌 = 𝑄(𝑦 𝑧 1 , 𝑦 𝑧 2 , ⋯ , 𝑦 𝑧 𝑜 |𝑌) • Plackett-Luce based Probability 𝑜 exp 𝑔 𝑇 𝑘−1 𝑦 𝑧 𝑘 , 𝑆 𝑧 𝑘 𝑄 𝑧 𝑌 = 𝑜 𝑙=𝑘 𝑓𝑦𝑞 𝑔 𝑇 𝑙−1 𝑦 𝑧 𝑙 , 𝑆 𝑧 𝑙 𝑘=1 8

  14. Relational Learning-to-Rank (Zhu et al., SIGIR’14) • R-LTR Pros: • Modeling sequential user behavior in the MMR way • A learnable framework to combine complex features • State-of-the-art empirical performance Can R-LTR be further improved? 9

  15. Motivation • R-LTR Cons: • Only utilizes “positive” rankings, but treat “negative” rankings equally • Not all negative rankings are equally negative (different scores) • How about using discriminative learning which is effective in many machine learning tasks? • Learning objective differs with diversity evaluation measures • How about directly optimizing evaluation measures? 10

  16. Major Idea learn MMR model using both positive and negative rankings Ho How w to ac o achi hieve eve thi his? s? optimize diversity evaluation measures 11

  17. Outline • Background • Rela late ted work rk • Our r appro roach • Experim iments • Summary ry 12

  18. Learning the ranking model Basic loss function 𝐳 (𝑜) is the ranking constructed by the maximal marginal relevance model 𝐾 (𝑜) denotes 𝑂 the human 𝐳 (𝑜) , 𝑲 (𝑜) ) min 𝒈 𝑇 𝑴( labels on the documents 𝑜=1 𝐳 (𝑜) , 𝑲 (𝑜) ) is the function for 𝑴( judging the ‘loss’ of the predicted 𝐳 (𝑜) compared with the ranking human labels 𝑲 (𝑜) 13

  19. Evaluation measures as loss function • Aim to maximize the diverse ranking accuracy in terms of a diversity evaluation measure on the training data Difficult to directly 𝑂 optimize the loss as 1 − 𝐹 𝑌 (𝑜) , 𝒛 (𝑜) , 𝐾 (𝑜) 𝑭 is a non-convex function. 𝑜=1 𝑭 represents the evaluation measures which measures the agreements between the ranking 𝐳 over documents in 𝒀 and the human judgements 𝑲 14

  20. Evaluation measures as loss function • Resort to optimize the upper bound of the loss function 𝑂 1 − 𝐹 𝑌 (𝑜) , 𝒛 (𝑜) , 𝐾 (𝑜) ∙ is one if the condition is 𝑜=1 satisfied otherwise zero Upp pper bou bounded 𝑍 +(𝑜) : positive rankings 𝑂 𝐹 𝑌 (𝑜) , 𝐳 + , 𝐾 (𝑜) − 𝐹(𝑌 (𝑜) , 𝐳 − , 𝐾 (𝑜) ) ∙ 𝐺(𝑌 (𝑜) , 𝑆 (𝑜) , 𝐳 + ) ≤ 𝐺(𝑌 (𝑜) , 𝑆 (𝑜) , 𝐳 − ) max 𝐳 + ∈𝑍 +(𝑜) ; 𝑜=1 𝐳 − ∈𝑍 −(𝑜) 𝑍 −(𝑜) : negative rankings 𝐺 𝑌, 𝑆, 𝐳 : the query level ranking model 𝐺 𝑌, 𝑆, 𝐳 = Pr 𝐳|𝑌, 𝑆 = Pr 𝐲 𝐳(1) ⋯ 𝐲 𝐳(𝑁) |𝑌, 𝑆 𝑁−1 Pr 𝐲 𝐳(𝑠) |𝑌, 𝑇 𝑠−1 , 𝑆 = 𝑠=1 exp 𝑔 𝑇𝑠−1 𝐲 𝑗 ,𝑆 𝐳(𝑠) 𝑁−1 = 𝑠=1 𝑁 𝑙=𝑠 exp 𝑔 𝑇𝑠−1 𝐲 𝑗 ,𝑆 𝐳(𝑙) 15

  21. Evaluation measures as loss function • Resort to optimize the upper bound of the loss function 𝑂 1 − 𝐹 𝑌 (𝑜) , 𝒛 (𝑜) , 𝐾 (𝑜) ∙ is one if the condition is 𝑜=1 satisfied otherwise zero Upp pper bou bounded 𝑍 +(𝑜) : positive rankings 𝑂 𝐹 𝑌 (𝑜) , 𝐳 + , 𝐾 (𝑜) − 𝐹(𝑌 (𝑜) , 𝐳 − , 𝐾 (𝑜) ) ∙ 𝐺(𝑌 (𝑜) , 𝑆 (𝑜) , 𝐳 + ) ≤ 𝐺(𝑌 (𝑜) , 𝑆 (𝑜) , 𝐳 − ) max 𝐳 + ∈𝑍 +(𝑜) ; 𝑜=1 𝐳 − ∈𝑍 −(𝑜) f 𝑭 ∈ 𝟏, 𝟐 Upp pper bou bounded if 𝑍 −(𝑜) : negative rankings 𝐺 𝑌, 𝑆, 𝐳 : the query level ranking model 𝑂 𝐺 𝑌, 𝑆, 𝐳 = Pr 𝐳|𝑌, 𝑆 𝐺 𝑌 (𝑜) , 𝑆 (𝑜) , 𝐳 + − 𝐺 𝑌 (𝑜) , 𝑆 (𝑜) , 𝐳 − ≤ 𝐹 𝑌 (𝑜) , 𝐳 + , 𝐾 (𝑜) − 𝐹 𝑌 (𝑜) , 𝐳 − , 𝐾 (𝑜) = Pr 𝐲 𝐳(1) ⋯ 𝐲 𝐳(𝑁) |𝑌, 𝑆 𝑁−1 Pr 𝐲 𝐳(𝑠) |𝑌, 𝑇 𝑠−1 , 𝑆 = 𝑠=1 𝑜=1 𝐳 + ∈𝑍 +(𝑜) ; 𝐳 − ∈𝑍 −(𝑜) exp 𝑔 𝑇𝑠−1 𝐲 𝑗 ,𝑆 𝐳(𝑠) 𝑁−1 = 𝑠=1 𝑁 𝑙=𝑠 exp 𝑔 𝑇𝑠−1 𝐲 𝑗 ,𝑆 𝐳(𝑙) 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend