learning to rank from pairwise approach to listwise
play

Learning to Rank: From Pairwise Approach to Listwise Approach Zhe - PowerPoint PPT Presentation

Learning to Rank: From Pairwise Approach to Listwise Approach Zhe Cao Tao Qin Tie-Yan Liu Ming-Feng Tsai Hang Li Microsoft Research Asia, Beijing (2007) Presented by Christian Kmmerle December 2, 2014 Christian Kmmerle (University of


  1. Learning to Rank: From Pairwise Approach to Listwise Approach Zhe Cao Tao Qin Tie-Yan Liu Ming-Feng Tsai Hang Li Microsoft Research Asia, Beijing (2007) Presented by Christian Kümmerle December 2, 2014 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  2. Content 1 Framework: Learning to Rank 2 The Listwise Approach 3 Loss function based on probability model 4 ListNet algorithm 5 Experiments and Conclusion Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  3. Framework: Learning to Rank What is Learning to Rank? Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  4. Framework: Learning to Rank What is Learning to Rank? Classical IR ranking task: Given a query, rank documents to a list. Query-dependent ranking functions: Vector space model, BM25, Language model Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  5. Framework: Learning to Rank What is Learning to Rank? Classical IR ranking task: Given a query, rank documents to a list. Query-dependent ranking functions: Vector space model, BM25, Language model Query-independent features of documents: e.g. ◮ PageRank ◮ URL-depth, e.g. http://sifaka.cs.uiuc.edu/ ∼ wang296/Course/IR_Fall/lectures.html has a depth of 4 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  6. Framework: Learning to Rank What is Learning to Rank? Classical IR ranking task: Given a query, rank documents to a list. Query-dependent ranking functions: Vector space model, BM25, Language model Query-independent features of documents: e.g. ◮ PageRank ◮ URL-depth, e.g. http://sifaka.cs.uiuc.edu/ ∼ wang296/Course/IR_Fall/lectures.html has a depth of 4 − → How can we combine all these "features" in order to get a better ranking function? Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  7. Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  8. Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: Input space Output space Hypothesis space Loss function Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  9. Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: In the authors’ paper: Input space: X = { x ( 1 ) , x ( 2 ) , . . . } , x ( i ) : List of feature representations of documents for query q i ← Listwise approach Output space: Y = { y ( 1 ) , y ( 2 ) , . . . } , y ( i ) : List of judgements of the relevance degree of the documents for q i ← Listwise approach Hypothesis space Loss function Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  10. Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: In the authors’ paper: Input space: X = { x ( 1 ) , x ( 2 ) , . . . } , x ( i ) : List of feature representations of documents for query q i ← Listwise approach Output space: Y = { y ( 1 ) , y ( 2 ) , . . . } , y ( i ) : List of judgements of the relevance degree of the documents for q i ← Listwise approach Hypothesis space ← Neural network Loss function Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  11. Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: In the authors’ paper: Input space: X = { x ( 1 ) , x ( 2 ) , . . . } , x ( i ) : List of feature representations of documents for query q i ← Listwise approach Output space: Y = { y ( 1 ) , y ( 2 ) , . . . } , y ( i ) : List of judgements of the relevance degree of the documents for q i ← Listwise approach Hypothesis space ← Neural network Loss function: Probability model on the space of permutations Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  12. The Listwise Approach Queries : Q = { q ( 1 ) , q ( 2 ) , . . . , q ( m ) } a set of m queries . List of documents : For query q ( i ) , there are n i documents: d ( i ) =( d ( i ) 1 , d ( i ) 2 , . . . , d ( i ) n i ) . Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  13. The Listwise Approach Queries : Q = { q ( 1 ) , q ( 2 ) , . . . , q ( m ) } a set of m queries . List of documents : For query q ( i ) , there are n i documents: d ( i ) =( d ( i ) 1 , d ( i ) 2 , . . . , d ( i ) n i ) . Feature representation in input space: x ( i ) =( x ( i ) 1 , x ( i ) 2 , . . . , x ( i ) n i ) with x ( i ) = Ψ( q ( i ) , d ( i ) ) , e.g. j j x ( i ) =( BM25 ( q ( i ) , d ( i ) ) , LM ( q ( i ) , d ( i ) ) , TFIDF ( q ( i ) , d ( i ) ) , PageRank ( d ( i ) ) , URLdepth ( d ( i ) )) ∈ R 5 j j j j j j Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  14. The Listwise Approach Queries : Q = { q ( 1 ) , q ( 2 ) , . . . , q ( m ) } a set of m queries . List of documents : For query q ( i ) , there are n i documents: d ( i ) =( d ( i ) 1 , d ( i ) 2 , . . . , d ( i ) n i ) . Feature representation in input space: x ( i ) =( x ( i ) 1 , x ( i ) 2 , . . . , x ( i ) n i ) with x ( i ) = Ψ( q ( i ) , d ( i ) ) , e.g. j j x ( i ) =( BM25 ( q ( i ) , d ( i ) ) , LM ( q ( i ) , d ( i ) ) , TFIDF ( q ( i ) , d ( i ) ) , PageRank ( d ( i ) ) , URLdepth ( d ( i ) )) ∈ R 5 j j j j j j List of judgment scores in output space: y ( i ) =( y ( i ) 1 , y ( i ) 2 , . . . , y ( i ) n i ) with implicitly or explicitly given judgement scores y ( i ) for all j documents corresponding to query q ( i ) . � m � ( x ( i ) , y ( i ) ) − → Training data set T = i = 1 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  15. What is a meaningful loss function? We want: Find a function f : X → Y such that the f ( x ( i ) ) are "not very different" from the y ( i ) . − → Loss function penalizes too big differences. Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  16. What is a meaningful loss function? We want: Find a function f : X → Y such that the f ( x ( i ) ) are "not very different" from the y ( i ) . − → Loss function penalizes too big differences. Idea: Just take NDCG! Perfectly ordered list can be derived from the given judgements y ( i ) . Problem: Discontinuity of NDCG with respect to the ranking scores, since NDCG is position based : Example Training query with NDCG = 1 Training query with NDCG = 0 . 86 f ( x ( i ) ) f ( x ( i ) ) 1 . 2 0 . 7 3 . 110 3 . 109 1 . 2 0 . 7 3 . 110 3 . 111 y ( i ) y ( i ) 2 1 4 3 2 1 4 3 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  17. Loss function based on probability model on permutations Solution: Define probability distributions P y ( i ) and P z ( i ) (for z ( i ) := ( f ( x ( i ) 1 ) , . . . , f ( x ( i ) n i )) ) on the set of permutations π on { 1 , . . . , n i } , take the KL divergence as loss function: L ( y ( i ) , z ( i ) ) := − � P y ( i ) ( π ) log ( P z ( i ) ( π )) ∝ KL ( P y ( i ) ( · ) || P z ( i ) ( · )) π Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  18. Loss function based on probability model on permutations Solution: Define probability distributions P y ( i ) and P z ( i ) (for z ( i ) := ( f ( x ( i ) 1 ) , . . . , f ( x ( i ) n i )) ) on the set of permutations π on { 1 , . . . , n i } , take the KL divergence as loss function: L ( y ( i ) , z ( i ) ) := − � P y ( i ) ( π ) log ( P z ( i ) ( π )) ∝ KL ( P y ( i ) ( · ) || P z ( i ) ( · )) π How to define the probability distribution? E.g. for the set of permutations on { 1 , 2 , 3 } , the scores ( y 1 , y 2 , y 3 ) and the permutation π := ( 1 , 3 , 2 ) : e y 1 e y 2 + e y 3 · e y 2 e y 3 P y ( π ) := e y 1 + e y 2 + e y 3 · e y 2 Definition If π is a permutation on { 1 , . . . , n } , its probability, given the list of scores y of length n , is: n exp ( y π − 1 ( j ) ) � P y ( π ) = � n l = j exp ( y π − 1 ( l ) ) j = 1 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend