Learning to Rank: From Pairwise Approach to Listwise Approach Zhe - - PowerPoint PPT Presentation

learning to rank from pairwise approach to listwise
SMART_READER_LITE
LIVE PREVIEW

Learning to Rank: From Pairwise Approach to Listwise Approach Zhe - - PowerPoint PPT Presentation

Learning to Rank: From Pairwise Approach to Listwise Approach Zhe Cao Tao Qin Tie-Yan Liu Ming-Feng Tsai Hang Li Microsoft Research Asia, Beijing (2007) Presented by Christian Kmmerle December 2, 2014 Christian Kmmerle (University of


slide-1
SLIDE 1

Learning to Rank: From Pairwise Approach to Listwise Approach

Zhe Cao Tao Qin Tie-Yan Liu Ming-Feng Tsai Hang Li

Microsoft Research Asia, Beijing (2007)

Presented by Christian Kümmerle December 2, 2014

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-2
SLIDE 2

Content

1 Framework: Learning to Rank 2 The Listwise Approach 3 Loss function based on probability model 4 ListNet algorithm 5 Experiments and Conclusion

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-3
SLIDE 3

Framework: Learning to Rank

What is Learning to Rank?

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-4
SLIDE 4

Framework: Learning to Rank

What is Learning to Rank? Classical IR ranking task: Given a query, rank documents to a list. Query-dependent ranking functions: Vector space model, BM25, Language model

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-5
SLIDE 5

Framework: Learning to Rank

What is Learning to Rank? Classical IR ranking task: Given a query, rank documents to a list. Query-dependent ranking functions: Vector space model, BM25, Language model Query-independent features of documents: e.g.

◮ PageRank ◮ URL-depth, e.g.

http://sifaka.cs.uiuc.edu/∼wang296/Course/IR_Fall/lectures.html has a depth of 4

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-6
SLIDE 6

Framework: Learning to Rank

What is Learning to Rank? Classical IR ranking task: Given a query, rank documents to a list. Query-dependent ranking functions: Vector space model, BM25, Language model Query-independent features of documents: e.g.

◮ PageRank ◮ URL-depth, e.g.

http://sifaka.cs.uiuc.edu/∼wang296/Course/IR_Fall/lectures.html has a depth of 4

− → How can we combine all these "features" in order to get a better ranking function?

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-7
SLIDE 7

Framework: Learning to Rank

What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning:

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-8
SLIDE 8

Framework: Learning to Rank

What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: Input space Output space Hypothesis space Loss function

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-9
SLIDE 9

Framework: Learning to Rank

What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: In the authors’ paper: Input space: X = {x(1), x(2), . . .}, x(i) : List of feature representations

  • f documents for query qi ← Listwise approach

Output space: Y = {y(1), y(2), . . .}, y(i) : List of judgements of the relevance degree of the documents for qi ← Listwise approach Hypothesis space Loss function

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-10
SLIDE 10

Framework: Learning to Rank

What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: In the authors’ paper: Input space: X = {x(1), x(2), . . .}, x(i) : List of feature representations

  • f documents for query qi ← Listwise approach

Output space: Y = {y(1), y(2), . . .}, y(i) : List of judgements of the relevance degree of the documents for qi ← Listwise approach Hypothesis space ← Neural network Loss function

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-11
SLIDE 11

Framework: Learning to Rank

What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: In the authors’ paper: Input space: X = {x(1), x(2), . . .}, x(i) : List of feature representations

  • f documents for query qi ← Listwise approach

Output space: Y = {y(1), y(2), . . .}, y(i) : List of judgements of the relevance degree of the documents for qi ← Listwise approach Hypothesis space ← Neural network Loss function: Probability model on the space of permutations

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-12
SLIDE 12

The Listwise Approach

Queries: Q = {q(1), q(2), . . . , q(m)} a set of m queries. List of documents: For query q(i), there are ni documents: d(i) =(d(i)

1 , d(i) 2 , . . . , d(i) ni ).

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-13
SLIDE 13

The Listwise Approach

Queries: Q = {q(1), q(2), . . . , q(m)} a set of m queries. List of documents: For query q(i), there are ni documents: d(i) =(d(i)

1 , d(i) 2 , . . . , d(i) ni ).

Feature representation in input space: x(i) =(x(i)

1 , x(i) 2 , . . . , x(i) ni )

with x(i)

j

= Ψ(q(i), d(i)

j

), e.g.

x(i)

j

=(BM25(q(i),d(i)

j

),LM(q(i),d(i)

j

),TFIDF(q(i),d(i)

j

),PageRank(d(i)

j

),URLdepth(d(i)

j

))∈R5

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-14
SLIDE 14

The Listwise Approach

Queries: Q = {q(1), q(2), . . . , q(m)} a set of m queries. List of documents: For query q(i), there are ni documents: d(i) =(d(i)

1 , d(i) 2 , . . . , d(i) ni ).

Feature representation in input space: x(i) =(x(i)

1 , x(i) 2 , . . . , x(i) ni )

with x(i)

j

= Ψ(q(i), d(i)

j

), e.g.

x(i)

j

=(BM25(q(i),d(i)

j

),LM(q(i),d(i)

j

),TFIDF(q(i),d(i)

j

),PageRank(d(i)

j

),URLdepth(d(i)

j

))∈R5

List of judgment scores in output space: y(i) =(y(i)

1 , y(i) 2 , . . . , y(i) ni )

with implicitly or explicitly given judgement scores y(i)

j

for all documents corresponding to query q(i). − → Training data set T =

  • (x(i), y(i))

m

i=1

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-15
SLIDE 15

What is a meaningful loss function?

We want: Find a function f : X → Y such that the f (x(i)) are "not very different" from the y(i). − → Loss function penalizes too big differences.

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-16
SLIDE 16

What is a meaningful loss function?

We want: Find a function f : X → Y such that the f (x(i)) are "not very different" from the y(i). − → Loss function penalizes too big differences. Idea: Just take NDCG! Perfectly ordered list can be derived from the given judgements y(i). Problem: Discontinuity of NDCG with respect to the ranking scores, since NDCG is position based: Example Training query with NDCG = 1 f (x(i)) 1.2 0.7 3.110 3.109 y(i) 2 1 4 3 Training query with NDCG = 0.86 f (x(i)) 1.2 0.7 3.110 3.111 y(i) 2 1 4 3

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-17
SLIDE 17

Loss function based on probability model on permutations

Solution: Define probability distributions Py(i) and Pz(i) (for z(i) := (f (x(i)

1 ), . . . , f (x(i) ni ))) on the set of permutations π on

{1, . . . , ni}, take the KL divergence as loss function: L(y(i), z(i)) := −

  • π

Py(i)(π) log(Pz(i)(π)) ∝ KL(Py(i)(·) || Pz(i)(·))

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-18
SLIDE 18

Loss function based on probability model on permutations

Solution: Define probability distributions Py(i) and Pz(i) (for z(i) := (f (x(i)

1 ), . . . , f (x(i) ni ))) on the set of permutations π on

{1, . . . , ni}, take the KL divergence as loss function: L(y(i), z(i)) := −

  • π

Py(i)(π) log(Pz(i)(π)) ∝ KL(Py(i)(·) || Pz(i)(·)) How to define the probability distribution? E.g. for the set of permutations

  • n {1, 2, 3}, the scores (y1, y2, y3) and the permutation π := (1, 3, 2):

Py(π) := ey1 ey1 + ey2 + ey3 · ey3 ey2 + ey3 · ey2 ey2 Definition If π is a permutation on {1, . . . , n}, its probability, given the list of scores y

  • f length n, is:

Py(π) =

n

  • j=1

exp(yπ−1(j)) n

l=j exp(yπ−1(l))

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-19
SLIDE 19

Loss function based on probability model on permutations

Solution: Define probability distributions Py(i) and Pz(i) (for z(i) := (f (x(i)

1 ), . . . , f (x(i) ni ))) on the set of permutations π on

{1, . . . , ni}, take the KL divergence as loss function: L(y(i), z(i)) := −

  • π

Py(i)(π) log(Pz(i)(π)) ∝ KL(Py(i)(·) || Pz(i)(·)) How to define the probability distribution? E.g. for the set of permutations

  • n {1, 2, 3}, the scores (y1, y2, y3) and the permutation π := (1, 3, 2):

Py(π) := ey1 ey1 + ey2 + ey3 · ey3 ey2 + ey3 · ey2 ey2 Definition For easier calculation we rather use in the algorithm with k fixed: Py(π) =

k

  • j=1

exp(yπ−1(j)) n

l=j exp(yπ−1(l))

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-20
SLIDE 20

The ListNet algorithm

Advantage: Loss function is differentiable with respect to the score vectors! − → We use functions fω from a Neural Network model as a hypothesis space. − → Learning task: min

ω m

  • i=1

L(y(i), fω(x(i)))

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-21
SLIDE 21

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-22
SLIDE 22

The ListNet algorithm

Advantage: Loss function is differentiable with respect to the score vectors! − → We use functions fω from a Neural Network model as a hypothesis space. − → Learning task: min

ω m

  • i=1

L(y(i), fω(x(i))) Implementation, based on gradient descent:

Algorithm 1 Learning Algorithm of ListNet Input:training data {(x(1), y(1)), (x(2), y(2)), ..., (x(m), y(m))} Parameter: number of iterations T and learning rate η Initialize parameter ω for t = 1 to T do for i = 1 to m do Input x(i) of query q(i) to Neural Network and com- pute score list z(i)( fω) with current ω Compute gradient △ω using Eq. (5) Update ω = ω − η × △ω end for end for Output Neural Network model ω

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-23
SLIDE 23

Experiments and Conclusion

Authors compared ranking accuracy of ListNet with other LtR algorithms on three large scale data sets (TREC, OHSUMED and Csearch; Number of features: 20, 30 and 600) Procedure: Divide data into training subset and testing subset, use traditional evaluation metrics (NDCG, MAP) on testing set.

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-24
SLIDE 24

Experiments and Conclusion

Authors compared ranking accuracy of ListNet with other LtR algorithms on three large scale data sets (TREC, OHSUMED and Csearch; Number of features: 20, 30 and 600) Procedure: Divide data into training subset and testing subset, use traditional evaluation metrics (NDCG, MAP) on testing set. Conclusions: ListNet outperforms algorithms based on pairwise approach (RankNet, RankingSVM, RankBoost) Drawback: High training complexity (O(nk) for list length n and parameter k)

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

slide-25
SLIDE 25

For Further Reading

Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H. Learning to Rank: From Pairwise Approach to Listwise Approach. In: Proceedings of the 24th International Conference on Machine Learning (ICML 2007), pp. 129–136 (2007) Tie-Yan Liu. Learning to rank for information retrieval. Springer, 2011. Thank you for your attention!

Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach