Machine Learning for Ranking
CE-324: Modern Information Retrieval
Sharif University of Technology
- M. Soleymani
Fall 2015
Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
Machine Learning for Ranking CE-324: Modern Information Retrieval - - PowerPoint PPT Presentation
Machine Learning for Ranking CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Machine learning for IR
Sharif University of Technology
Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
Cosine similarity, idf, proximity, pivoted doc length normalization, doc
Naïve Bayes, Rocchio, kNN
Sounds like a good idea A.k.a.“machine-learned relevance” or “learning to rank”
2
In the last decade
Modern supervised ML has been around for about 20
Naïve Bayes has been around for about 50 years…
3
Wong, S.K. et al. 1988. Linear structure in information retrieval. SIGIR
Fuhr, N. 1992. Probabilistic
Gey, F. C. 1994. Inferring probability of relevance using the method of
Herbrich, R. et al. 2000. Large Margin Rank Boundaries for Ordinal
4
Especially for real world use (as opposed to writing academic papers), it
This has changed, both in academia and industry
5
T
Inverse document frequency Doc length
And people did
6
Arbitrary useful features – not a single unified model
Log frequency of query word in anchor text? Query word in color on page? # of images on page? # of (out) links on page? PageRank of page? URL length? URL contains “~”? Page edit recency? Page length?
7
8
Only two zone:
title, body
We intend to find an optimal value for 𝛽
9
In the simplest form, each relevance judgments is either
10
11
Relevance r is here binary (but may be multiclass 3–7)
𝒚 = (𝑦1, 𝑦2)
𝑦1: cosine similarity 𝑦2: minimum query window size (shortest text span including all query
Query term proximity is a very important new weighting
12
13
14
R R R R R R R R R R R N N N N N N N N N N
15
16
1 𝑒, 𝑟 + ⋯ + 𝑥𝑒𝑔 𝑒(𝑒, 𝑟) + 𝑐
Training: (𝑒, 𝑟) must be negative for nonrelevant docs and positive for
Testing: decide relevant iff (𝑒, 𝑟) ≥ 0
To deal with query words not in your training data? but scores like the summed (log) tf of all query terms
Problem: It can result in trivial always-say-nonrelevant classifiers A solution: undersampling nonrelevant docs during training (just take
17
4 TREC data sets Comparisons with Lemur, a state-of-the-art open source IR
6 features, all variants of tf, idf, and tf.idf scores
18
Actually a little bit below
This is illustrated on a homepage finding task onWT10G:
Baseline LM 52% p@10, baseline SVM 58% SVM with URL-depth, and in-link features: 78% p@10
19
Classification: Map to a unordered set of classes Regression: Map to a real value Ordinal regression: Map to an ordered set of classes
A fairly obscure sub-branch of statistics, but what we want here
Relations between relevance levels are modeled
A number of categories of relevance:
These are totally ordered: 𝑑1 < 𝑑2 < ⋯ < 𝑑𝐿
Training data: each doc-query pair represented as a feature vector 𝜒𝑗
20
Predicting class label or relevance score
Predicting relative order is closer to the nature of ranking than
Input is a pair of results for a query, and the class is the relevance
Learns a ranking function Models the ranking problem in a straightforward fashion.
can overcome the drawback of the above approaches by tackling the ranking problem
directly
21
When number of associated docs varies largely for different queries,
The position of docs in the ranked list is unimportant.
The pointwise loss function may unconsciously emphasize too much those
22
This turns an ordinal regression problem back into a binary
Predicting relative order is closer to the nature of ranking than
The distribution of doc pair number is more skewed than the
23
log term frequency, idf, pivoted length normalization
24
25