Learning to Rank: From Pairwise Approach to Listwise Approach Zhe - PowerPoint PPT Presentation

Learning to Rank: From Pairwise Approach to Listwise Approach Zhe Cao Tao Qin Tie-Yan Liu Ming-Feng Tsai Hang Li Microsoft Research Asia, Beijing (2007) Presented by Christian Kümmerle December 2, 2014 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Content 1 Framework: Learning to Rank 2 The Listwise Approach 3 Loss function based on probability model 4 ListNet algorithm 5 Experiments and Conclusion Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Framework: Learning to Rank What is Learning to Rank? Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Framework: Learning to Rank What is Learning to Rank? Classical IR ranking task: Given a query, rank documents to a list. Query-dependent ranking functions: Vector space model, BM25, Language model Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Framework: Learning to Rank What is Learning to Rank? Classical IR ranking task: Given a query, rank documents to a list. Query-dependent ranking functions: Vector space model, BM25, Language model Query-independent features of documents: e.g. ◮ PageRank ◮ URL-depth, e.g. http://sifaka.cs.uiuc.edu/ ∼ wang296/Course/IR_Fall/lectures.html has a depth of 4 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Framework: Learning to Rank What is Learning to Rank? Classical IR ranking task: Given a query, rank documents to a list. Query-dependent ranking functions: Vector space model, BM25, Language model Query-independent features of documents: e.g. ◮ PageRank ◮ URL-depth, e.g. http://sifaka.cs.uiuc.edu/ ∼ wang296/Course/IR_Fall/lectures.html has a depth of 4 − → How can we combine all these "features" in order to get a better ranking function? Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: Input space Output space Hypothesis space Loss function Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: In the authors’ paper: Input space: X = { x ( 1 ) , x ( 2 ) , . . . } , x ( i ) : List of feature representations of documents for query q i ← Listwise approach Output space: Y = { y ( 1 ) , y ( 2 ) , . . . } , y ( i ) : List of judgements of the relevance degree of the documents for q i ← Listwise approach Hypothesis space Loss function Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: In the authors’ paper: Input space: X = { x ( 1 ) , x ( 2 ) , . . . } , x ( i ) : List of feature representations of documents for query q i ← Listwise approach Output space: Y = { y ( 1 ) , y ( 2 ) , . . . } , y ( i ) : List of judgements of the relevance degree of the documents for q i ← Listwise approach Hypothesis space ← Neural network Loss function Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: In the authors’ paper: Input space: X = { x ( 1 ) , x ( 2 ) , . . . } , x ( i ) : List of feature representations of documents for query q i ← Listwise approach Output space: Y = { y ( 1 ) , y ( 2 ) , . . . } , y ( i ) : List of judgements of the relevance degree of the documents for q i ← Listwise approach Hypothesis space ← Neural network Loss function: Probability model on the space of permutations Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

The Listwise Approach Queries : Q = { q ( 1 ) , q ( 2 ) , . . . , q ( m ) } a set of m queries . List of documents : For query q ( i ) , there are n i documents: d ( i ) =( d ( i ) 1 , d ( i ) 2 , . . . , d ( i ) n i ) . Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

The Listwise Approach Queries : Q = { q ( 1 ) , q ( 2 ) , . . . , q ( m ) } a set of m queries . List of documents : For query q ( i ) , there are n i documents: d ( i ) =( d ( i ) 1 , d ( i ) 2 , . . . , d ( i ) n i ) . Feature representation in input space: x ( i ) =( x ( i ) 1 , x ( i ) 2 , . . . , x ( i ) n i ) with x ( i ) = Ψ( q ( i ) , d ( i ) ) , e.g. j j x ( i ) =( BM25 ( q ( i ) , d ( i ) ) , LM ( q ( i ) , d ( i ) ) , TFIDF ( q ( i ) , d ( i ) ) , PageRank ( d ( i ) ) , URLdepth ( d ( i ) )) ∈ R 5 j j j j j j Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

The Listwise Approach Queries : Q = { q ( 1 ) , q ( 2 ) , . . . , q ( m ) } a set of m queries . List of documents : For query q ( i ) , there are n i documents: d ( i ) =( d ( i ) 1 , d ( i ) 2 , . . . , d ( i ) n i ) . Feature representation in input space: x ( i ) =( x ( i ) 1 , x ( i ) 2 , . . . , x ( i ) n i ) with x ( i ) = Ψ( q ( i ) , d ( i ) ) , e.g. j j x ( i ) =( BM25 ( q ( i ) , d ( i ) ) , LM ( q ( i ) , d ( i ) ) , TFIDF ( q ( i ) , d ( i ) ) , PageRank ( d ( i ) ) , URLdepth ( d ( i ) )) ∈ R 5 j j j j j j List of judgment scores in output space: y ( i ) =( y ( i ) 1 , y ( i ) 2 , . . . , y ( i ) n i ) with implicitly or explicitly given judgement scores y ( i ) for all j documents corresponding to query q ( i ) . � m � ( x ( i ) , y ( i ) ) − → Training data set T = i = 1 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

What is a meaningful loss function? We want: Find a function f : X → Y such that the f ( x ( i ) ) are "not very different" from the y ( i ) . − → Loss function penalizes too big differences. Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

What is a meaningful loss function? We want: Find a function f : X → Y such that the f ( x ( i ) ) are "not very different" from the y ( i ) . − → Loss function penalizes too big differences. Idea: Just take NDCG! Perfectly ordered list can be derived from the given judgements y ( i ) . Problem: Discontinuity of NDCG with respect to the ranking scores, since NDCG is position based : Example Training query with NDCG = 1 Training query with NDCG = 0 . 86 f ( x ( i ) ) f ( x ( i ) ) 1 . 2 0 . 7 3 . 110 3 . 109 1 . 2 0 . 7 3 . 110 3 . 111 y ( i ) y ( i ) 2 1 4 3 2 1 4 3 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Loss function based on probability model on permutations Solution: Define probability distributions P y ( i ) and P z ( i ) (for z ( i ) := ( f ( x ( i ) 1 ) , . . . , f ( x ( i ) n i )) ) on the set of permutations π on { 1 , . . . , n i } , take the KL divergence as loss function: L ( y ( i ) , z ( i ) ) := − � P y ( i ) ( π ) log ( P z ( i ) ( π )) ∝ KL ( P y ( i ) ( · ) || P z ( i ) ( · )) π Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Loss function based on probability model on permutations Solution: Define probability distributions P y ( i ) and P z ( i ) (for z ( i ) := ( f ( x ( i ) 1 ) , . . . , f ( x ( i ) n i )) ) on the set of permutations π on { 1 , . . . , n i } , take the KL divergence as loss function: L ( y ( i ) , z ( i ) ) := − � P y ( i ) ( π ) log ( P z ( i ) ( π )) ∝ KL ( P y ( i ) ( · ) || P z ( i ) ( · )) π How to define the probability distribution? E.g. for the set of permutations on { 1 , 2 , 3 } , the scores ( y 1 , y 2 , y 3 ) and the permutation π := ( 1 , 3 , 2 ) : e y 1 e y 2 + e y 3 · e y 2 e y 3 P y ( π ) := e y 1 + e y 2 + e y 3 · e y 2 Definition If π is a permutation on { 1 , . . . , n } , its probability, given the list of scores y of length n , is: n exp ( y π − 1 ( j ) ) � P y ( π ) = � n l = j exp ( y π − 1 ( l ) ) j = 1 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Learning to Rank: From Pairwise Approach to Listwise Approach Zhe - PowerPoint PPT Presentation

Learning to Rank: From Pairwise Approach to Listwise Approach Zhe Cao Tao Qin Tie-Yan Liu Ming-Feng Tsai Hang Li Microsoft Research Asia, Beijing (2007) Presented by Christian Kmmerle December 2, 2014 Christian Kmmerle (University of

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Rank Aggregation from Pairwise Comparisons in the Presence of Adversarial Corruptions Arpit

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Graph Resistance and Learning from Pairwise Comparisons pairwise comparisons of items. In

Pairwise sequence alignments Volker Flegel Vassilios Ioannidis VI - 2004 Page 1 Outline

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise Alignment Review: Tips and

Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice

PAIRWISE DECOMPOSITION OF IMAGE SEQUENCES FOR ACTIVE MULTI-VIEW RECOGNITION(EXPERIMENT)

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Learning to Rank Learning to Rank with Partially-Labeled Data with Partially-Labeled Data Kevin

Cross-Domain Learning-to-rank with SVM Erheng Zhong 1 1 Department of Computer Science and

Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin

2018 - 2019 Teacher Salary Comparison Report 0-Year 5-Year 10-Year 15-Year 20-Year District

How to Maximize SEO and Use Cornerstone Content Matt Kepnes ( @NomadicMatt ) Shannon ODonnell (

Overview & Natural Language Processing: Natural Synergies to Support Digital Information

Will Nathan like Camille? Will Nathan vote for candidate T.? 2

On Finding Power Method in Spreading Activation Search Jn Suchal Supervisor: Prof. Pavol

Social/Network/Analysis mohamed.bouguessa@uqo.ca/ 1 Web/today 2

A new job for statisticians: the data scientist. Which skills, how to build them Antonio Ottaiano ,

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard

1Table A System for Managing Structured Web Data Yang Zhang with: Alon Halevy, Mike Cafarella,

Learning to Rank: From Pairwise Approach to Listwise Approach Zhe - PowerPoint PPT Presentation

Learning to Rank: From Pairwise Approach to Listwise Approach Zhe Cao Tao Qin Tie-Yan Liu Ming-Feng Tsai Hang Li Microsoft Research Asia, Beijing (2007) Presented by Christian Kmmerle December 2, 2014 Christian Kmmerle (University of

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Rank Aggregation from Pairwise Comparisons in the Presence of Adversarial Corruptions Arpit

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Graph Resistance and Learning from Pairwise Comparisons pairwise comparisons of items. In

Pairwise sequence alignments Volker Flegel Vassilios Ioannidis VI - 2004 Page 1 Outline

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise Alignment Review: Tips and

Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice

PAIRWISE DECOMPOSITION OF IMAGE SEQUENCES FOR ACTIVE MULTI-VIEW RECOGNITION(EXPERIMENT)

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Learning to Rank Learning to Rank with Partially-Labeled Data with Partially-Labeled Data Kevin

Cross-Domain Learning-to-rank with SVM Erheng Zhong 1 1 Department of Computer Science and

Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin

2018 - 2019 Teacher Salary Comparison Report 0-Year 5-Year 10-Year 15-Year 20-Year District

How to Maximize SEO and Use Cornerstone Content Matt Kepnes ( @NomadicMatt ) Shannon ODonnell (

Overview &amp; Natural Language Processing: Natural Synergies to Support Digital Information

Will Nathan like Camille? Will Nathan vote for candidate T.? 2

On Finding Power Method in Spreading Activation Search Jn Suchal Supervisor: Prof. Pavol

Social/Network/Analysis mohamed.bouguessa@uqo.ca/ 1 Web/today 2

A new job for statisticians: the data scientist. Which skills, how to build them Antonio Ottaiano ,

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard

1Table A System for Managing Structured Web Data Yang Zhang with: Alon Halevy, Mike Cafarella,

Overview & Natural Language Processing: Natural Synergies to Support Digital Information