Learning to Rank
Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net 2019 EE448, Big Data Mining, Lecture 9
http://wnzhang.net/teaching/ee448/index.html
Learning to Rank Weinan Zhang Shanghai Jiao Tong University - - PowerPoint PPT Presentation
2019 EE448, Big Data Mining, Lecture 9 Learning to Rank Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Content of This Course Another ML problem: ranking Learning to rank
Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net 2019 EE448, Big Data Mining, Lecture 9
http://wnzhang.net/teaching/ee448/index.html
Learning to rank Pointwise methods Pairwise methods Listwise methods
Sincerely thank Dr. Tie-Yan Liu
book/html/htmledition/the-probability-ranking- principle-1.html
L(yi; fμ(xi)) = 1 2(yi ¡ fμ(xi))2 L(yi; fμ(xi)) = 1 2(yi ¡ fμ(xi))2
min
μ
1 N
N
X
i=1
L(yi; fμ(xi)) min
μ
1 N
N
X
i=1
L(yi; fμ(xi))
L(yi; fμ(xi)) = ¡yi log fμ(xi) ¡ (1 ¡ yi) log(1 ¡ fμ(xi)) L(yi; fμ(xi)) = ¡yi log fμ(xi) ¡ (1 ¡ yi) log(1 ¡ fμ(xi))
X = fx1; x2; : : : ; xng X = fx1; x2; : : : ; xng
^ Y = fxr1; xr2; : : : ; xrng ^ Y = fxr1; xr2; : : : ; xrng
Y = fxy1; xy2; : : : ; xyng Y = fxy1; xy2; : : : ; xyng
D = fdig D = fdig q
Indexed Document Repository Query Ranking Model Ranked List of Documents
query = q query = q
dq
1 =https://www.crunchbase.com
dq
2 =https://www.reddit.com
. . . dq
n =https://www.quora.com
dq
1 =https://www.crunchbase.com
dq
2 =https://www.reddit.com
. . . dq
n =https://www.quora.com
"ML in China"
having the following two properties
list of features
yi = fμ(xi) yi = fμ(xi)
ratings
yi = fμ(xi) yi = fμ(xi)
Features
Rating Document Query Length Doc PageRank Doc Length Title Rel. Content Rel. 3 d1=http://crunchbase.com 0.30 0.61 0.47 0.54 0.76 5 d2=http://reddit.com 0.30 0.81 0.76 0.91 0.81 4 d3=http://quora.com 0.30 0.86 0.56 0.96 0.69
Query=‘ML in China’
Document features Query-doc features Query features
rank the documents given a query
Tie-Yan Liu. Learning to Rank for Information Retrieval. Springer 2011.
http://www.cda.cn/uploadfile/image/20151220/20151220115436_46293.pdf
Features Query=‘ML in China’
yi = fμ(xi) yi = fμ(xi)
min
μ
1 2N
N
X
i=1
(yi ¡ fμ(xi))2 min
μ
1 2N
N
X
i=1
(yi ¡ fμ(xi))2
Rating Document Query Length Doc PageRank Doc Length Title Rel. Content Rel. 3 d1=http://crunchbase.com 0.30 0.61 0.47 0.54 0.76 5 d2=http://reddit.com 0.30 0.81 0.76 0.91 0.81 4 d3=http://quora.com 0.30 0.86 0.56 0.96 0.69
Relevancy
Doc 1 Ground truth Doc 2 Ground truth
3 2.4 3.6 3.4 4.6 4
Ranking interchanged
relative preference on a document pair
q(i) q(i) 2 6 6 6 6 4 d(i)
1 ; 5
d(i)
2 ; 3
. . . d(i)
n(i); 2
3 7 7 7 7 5 2 6 6 6 6 4 d(i)
1 ; 5
d(i)
2 ; 3
. . . d(i)
n(i); 2
3 7 7 7 7 5
Transform
n (d(i)
1 ; d(i) 2 ); (d(i) 1 ; d(i) n(i)); : : : ; (d(i) 2 ; d(i) n(i))
(d(i)
1 ; d(i) 2 ); (d(i) 1 ; d(i) n(i)); : : : ; (d(i) 2 ; d(i) n(i))
q(i) 5 > 3 5 > 3 5 > 2 5 > 2 3 > 2 3 > 2
q (di; dj) (di; dj) yi;j = ( 1 if i B j
yi;j = ( 1 if i B j
Pi;j = P(di B djjq) = exp(oi;j) 1 + exp(oi;j) Pi;j = P(di B djjq) = exp(oi;j) 1 + exp(oi;j)
xi is the feature vector of (q, di)
L(q; di; dj) = ¡yi;j log Pi;j ¡ (1 ¡ yi;j) log(1 ¡ Pi;j) L(q; di; dj) = ¡yi;j log Pi;j ¡ (1 ¡ yi;j) log(1 ¡ Pi;j)
neural network
Burges, Christopher JC, Robert Ragno, and Quoc Viet Le. "Learning to rank with nonsmooth cost functions." NIPS. Vol. 6. 2006.
fμ(xi) fμ(xi)
@L(q; di; dj) @μ =@L(q; di; dj) @Pi;j @Pi;j @oi;j @oi;j @μ =@L(q; di; dj) @Pi;j @Pi;j @oi;j ³@fμ(xi) @μ ¡ @fμ(xj) @μ ´ @L(q; di; dj) @μ =@L(q; di; dj) @Pi;j @Pi;j @oi;j @oi;j @μ =@L(q; di; dj) @Pi;j @Pi;j @oi;j ³@fμ(xi) @μ ¡ @fμ(xj) @μ ´
Pi;j = P(di B djjq) = exp(oi;j) 1 + exp(oi;j) Pi;j = P(di B djjq) = exp(oi;j) 1 + exp(oi;j)
L(q; di; dj) = ¡yi;j log Pi;j ¡ (1 ¡ yi;j) log(1 ¡ Pi;j) L(q; di; dj) = ¡yi;j log Pi;j ¡ (1 ¡ yi;j) log(1 ¡ Pi;j)
BP in NN
importance
2 4 3 2 4
Documents Rating
P@k = #frelevant documents in top k resultsg k P@k = #frelevant documents in top k resultsg k
AP = P
k P@k ¢ yi(k)
#frelevant documentsg AP = P
k P@k ¢ yi(k)
#frelevant documentsg
yi = ( 1 if di is relevant with q
yi = ( 1 if di is relevant with q
1 1 1
AP = 1 3 ¢ ³1 1 + 2 3 + 3 5 ´ AP = 1 3 ¢ ³1 1 + 2 3 + 3 5 ´
for query q
NDCG@k = Zk
k
X
j=1
2yi(j) ¡ 1 log(j + 1) NDCG@k = Zk
k
X
j=1
2yi(j) ¡ 1 log(j + 1)
yi 2 f0; 1; 2; 3; 4g yi 2 f0; 1; 2; 3; 4g
Normalizer Gain Discount
NDCG@k = Zk
k
X
j=1
2yi(j) ¡ 1 log(j + 1) NDCG@k = Zk
k
X
j=1
2yi(j) ¡ 1 log(j + 1)
y = 0 y = 1
between the prediction list and the ground truth list
distribution and the ground truth
Cao, Zhe, et al. "Learning to rank: from pairwise approach to listwise approach." Proceedings of the 24th international conference on Machine learning. ACM, 2007.
yi = fμ(xi) yi = fμ(xi) fyigi=1:::n fyigi=1:::n
Pf([j1; j2; : : : ; jk]) =
k
Y
t=1
exp(f(xjt)) Pn
l=t exp(f(xjl))
Pf([j1; j2; : : : ; jk]) =
k
Y
t=1
exp(f(xjt)) Pn
l=t exp(f(xjl))
L(y; f(x)) = ¡ X
g2Gk
Py(g) log Pf(g) L(y; f(x)) = ¡ X
g2Gk
Py(g) log Pf(g)
Tie-Yan Liu @ Tutorial at WWW 2008
complexity
Burges, Christopher JC, Robert Ragno, and Quoc Viet Le. "Learning to rank with nonsmooth cost functions." NIPS. Vol. 6. 2006.
@L(q; di; dj) @μ = @L(q; di; dj) @Pi;j @Pi;j @oi;j | {z }
¸i;j
³@fμ(xi) @μ ¡ @fμ(xi) @μ ´ @L(q; di; dj) @μ = @L(q; di; dj) @Pi;j @Pi;j @oi;j | {z }
¸i;j
³@fμ(xi) @μ ¡ @fμ(xi) @μ ´
Pairwise ranking loss
Scoring function itself
¸i;j ¸i;j h(¸i;j; gq) h(¸i;j; gq)
Current ranking list
@L(q; di; dj) @μ = h(¸i;j; gq) ³@fμ(xi) @μ ¡ @fμ(xj) @μ ´ @L(q; di; dj) @μ = h(¸i;j; gq) ³@fμ(xi) @μ ¡ @fμ(xj) @μ ´
h(¸i;j; gq) = ¸i;j¢NDCGi;j h(¸i;j; gq) = ¸i;j¢NDCGi;j
Burges, Christopher JC, Robert Ragno, and Quoc Viet Le. "Learning to rank with nonsmooth cost functions." NIPS. Vol. 6. 2006.
Linear nets
Burges, Christopher JC, Robert Ragno, and Quoc Viet Le. "Learning to rank with nonsmooth cost functions." NIPS. Vol. 6. 2006.
learning to rank
information