DocumentSelec,onMethodologies forEfficientandEffec,ve - - PowerPoint PPT Presentation

document selec on methodologies for efficient and effec
SMART_READER_LITE
LIVE PREVIEW

DocumentSelec,onMethodologies forEfficientandEffec,ve - - PowerPoint PPT Presentation

DocumentSelec,onMethodologies forEfficientandEffec,ve LearningtoRank JavedAslam,EvangelosKanoulas, VirgilPavlu,StefanSavev,EmineYilmaz SearchEngines Users Request


slide-1
SLIDE 1

Document
Selec,on
Methodologies
 for
Efficient
and
Effec,ve

 Learning‐to‐Rank


Javed
Aslam,
Evangelos
Kanoulas,

 Virgil
Pavlu,
Stefan
Savev,
Emine
Yilmaz


slide-2
SLIDE 2

Search
Engines


Search Engine User’s Request Document Corpus Results BM25,I*idf,
 PageRank,
…
 Hundreds
of
 features


slide-3
SLIDE 3

Training
Search
Engines


Search Engine Document Corpus BM25,I*idf,
 PageRank,
…


Queries

1.
Neural
Network
 2.
Support
Vector
 Machine
 3.
Regression
Func,on
 4.
Decision
Tree
 …


Judges Metric

slide-4
SLIDE 4

Training
Data
Sets


  • Data
Collec,ons


– Billions
of
documents
 – Thousands
of
queries


  • Ideal,
in
theory;
infeasible,
in
prac,ce…


– Extract
features
from
all
query‐document
pairs
 – Judge
each
document
with
respect
to
each
query


  • Extensive
human
effort


– Train
over
all
query‐document
pairs


slide-5
SLIDE 5

Training
Data
Sets


  • Train
the
ranking
func,on
over
a
subset
of
the


complete
collec,on


  • Few
queries
with
many
document
judged
vs.


many
queries
with
few
documents
judged


– Be^er
to
train
over
many
queries
with
few
judged
 documents
[Yilmaz
and
Robertson
’09]


  • How
should
we
select
document?

slide-6
SLIDE 6

Training
Data
Sets


  • Machine
Learning
(Ac,ve
Learning)


– Itera,ve
process
 – Tightly
coupled
with
the
learning
algorithm


  • IR
Evalua,on


– Many
test
collec,ons
already
available
 – Efficient
and
effec,ve
techniques
to
construct
test
 collec,ons


  • Intelligent
way
of
selec,ng
documents

  • Inferences
of
effec,veness
metrics

slide-7
SLIDE 7

Duality
between
LTR
and
Evalua,on


  • This
work:
Explore
duality
between
Evalua,on


and
Learning‐to‐Rank


– Employ
techniques
used
for
efficient
and
effec,ve
 test
collec,on
construc,on
to
construct
training
 collec,ons


slide-8
SLIDE 8

Duality
between
LTR
and
Evalua,on


  • Can
test
collec,on
construc,on


methodologies
be
used
to
construct
training
 collec,ons?


  • If
yes,
which
one
of
these
methodologies
is


be^er?


  • What
makes
a
training
set
be^er
than
the

  • ther?

slide-9
SLIDE 9

Methodology


  • Depth‐100
pool
(as
the
complete
collec,on)

  • Select
subsets
of
documents
from
the
depth‐100
pool


– Using
different
document
selec,on
methodologies


  • Train
over
the
different
training
sets


– Using
a
number
of
learning‐to‐rank
algorithms


  • Test
the
performance
of
the
resul,ng
ranking
func,ons


– Five
fold
cross
valida,on


slide-10
SLIDE 10

Data
Sets


  • Data
from
TREC
6,7
and
8


– Document
corpus
:
TREC
Discs
4
and
5
 – Queries
:
150
queries;
ad‐hoc
tracks
 – Relevance
judgments
:
depth‐100
pools


  • Features
from
each
query‐document
pair


– 22
features;
subset
of
LETOR
features
 
(BM25,
Language
Models,
TF‐IDF,
…)


slide-11
SLIDE 11

Document
Selec,on
Methodologies


  • Select
subsets
of
documents



– Subset
size
varying
from
6%
to
60%


  • 1. Depth‐k
pooling

  • 2. InfAP
(uniform
random
sampling)

  • 3. StatAP
(stra,fied
random
sampling)

  • 4. MTC
(greedy
on‐line
algorithm)

  • 5. LETOR
(top‐k
by
BM25;
current
prac,ce)

  • 6. Hedge
(greedy
on‐line
algorithm)

slide-12
SLIDE 12

Document
Selec,on
Methodologies


10 20 30 40 50 60 70 0.2 0.4 0.6 0.8 Percentage of data used for training Precision Precision of the selection methods depth hedge infAP mtc statAP LETOR

  • Precision
:
frac,on
of
selected
documents
that
are


relevant


  • Discrepancy
:
symmetrized
KL
divergence
between


documents’
language
models


10 20 30 40 50 60 70 4 4.5 5 5.5 6 6.5 Percentage of data used for training Discrepancy (symmetrized KL divergence) Discrepancy between relevant and nonrelevant documents depth hedge infAP mtc statAP LETOR

slide-13
SLIDE 13

LTR
Algorithms


  • Train
over
the
different
data
sets

  • 1. Regression
(classifica,on
error)

  • 2. Ranking
SVM
(AUC)

  • 3. RankBoost
(pairwise
preferences)

  • 4. RankNet
(probability
of
correct
order)

  • 5. LambdaRank
(nDCG)

slide-14
SLIDE 14

Results
(1)


10 20 30 40 50 60 70 0.1 0.15 0.2 0.25 Percentage of data used for training MAP Regression depth hedge infAP MTC statAP LETOR 5 10 0.1 0.15 0.2 0.25 10 20 30 40 50 60 70 0.1 0.15 0.2 0.25 Percentage of data used for training MAP Ranking SVM depth hedge infAP MTC statAP LETOR 5 10 0.1 0.15 0.2 0.25

slide-15
SLIDE 15

Results
(2)


10 20 30 40 50 60 70 0.1 0.15 0.2 0.25 Percentage of data used for training MAP LambdaRank depth hedge infAP MTC statAP LETOR 5 10 0.1 0.15 0.2 0.25 10 20 30 40 50 60 70 0.1 0.15 0.2 0.25 Percentage of data used for training MAP RankBoost depth hedge infAP MTC statAP LETOR 5 10 0.1 0.15 0.2 0.25

slide-16
SLIDE 16

Results
(3)


10 20 30 40 50 60 70 0.1 0.15 0.2 0.25 Percentage of data used for training MAP RankNet depth hedge infAP MTC statAP LETOR 5 10 0.1 0.15 0.2 0.25

10 20 30 40 50 60 70 0.1 0.15 0.2 0.25 Percentage of data used for training MAP RankNet (hidden layer) depth hedge infAP MTC statAP LETOR 5 10 0.1 0.15 0.2 0.25

slide-17
SLIDE 17

Observa,ons
(1)


  • Some
Learning‐to‐Rank
algorithms
are
robust


to
document
selec,on
methodologies


– LambdaRank
vs.
RankBoost


10 20 30 40 50 60 70 0.1 0.15 0.2 0.25 Percentage of data used for training MAP LambdaRank depth hedge infAP MTC statAP LETOR 5 10 0.1 0.15 0.2 0.25 10 20 30 40 50 60 70 0.1 0.15 0.2 0.25 Percentage of data used for training MAP RankBoost depth hedge infAP MTC statAP LETOR 5 10 0.1 0.15 0.2 0.25

slide-18
SLIDE 18

Observa,ons
(2)


  • Near‐op,mal
performance
with
1%‐2%
of


complete
collec,on
(depth‐100
pool)


– No
significant
differences
at
greater
%
(t‐test)
 – Number
of
features
ma^er
[Taylor
et.al
‘06]



10 20 30 40 50 60 70 0.1 0.15 0.2 0.25 Percentage of data used for training MAP RankNet depth hedge infAP MTC statAP LETOR 5 10 0.1 0.15 0.2 0.25

slide-19
SLIDE 19

Observa,ons
(3)


  • Selec,on
methodology
ma^ers


– Hedge
(worst
performance)
 – Depth‐k
pooling
and
statAP
(best
performance)
 – LETOR‐like
(neither
most
efficient
nor
most
effec,ve)


10 20 30 40 50 60 70 0.1 0.15 0.2 0.25 Percentage of data used for training MAP Ranking SVM depth hedge infAP MTC statAP LETOR 5 10 0.1 0.15 0.2 0.25

slide-20
SLIDE 20

Rela,ve
Importance
on
Effec,veness


  • Learning‐to‐Rank
algorithm
vs.
document


selec,on
methodology


– 2‐way
ANOVA
model


  • Variance
decomposi,on
over
all
data
sets


– 26%
due
to
document
selec,on
 – 31%
due
to
LTR
algorithm


  • Variance
decomposi,on
(small
data
sets,
<10%)


– 44%
due
to
document
selec,on
 – 31%
due
to
LTR
algorithm


slide-21
SLIDE 21

What
makes
one
training
set
be^er
 than
another?


  • Different
methods
have
different
proper,es


– Precision
 – Recall
 – Similari,es
between
relevant
documents
 – Similari,es
between
relevant
and
non‐relevant
 documents
 – ...


  • Model
selec,on

slide-22
SLIDE 22

What
makes
one
training
set
be^er
 than
another?


  • Different
methods
have
different
proper,es


– Precision
 – Recall
 – Similari,es
between
relevant
documents
 – Similari,es
between
relevant
and
non‐relevant
 documents
 – ...


  • Model
selec,on


– Linear
model
(adjusted
R2
=
0.99)


slide-23
SLIDE 23

What
makes
one
training
set
be^er
 than
another?


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.15 0.2 0.25 MAP Precision Ranking SVM 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.15 0.2 0.25 MAP Precision RankBoost

slide-24
SLIDE 24

What
makes
one
training
set
be^er
 than
another?


4 4.5 5 5.5 6 6.5 0.1 0.15 0.2 0.25 MAP Discrepancy between relevant and nonrelevant documents in the training data RankBoost 4 4.5 5 5.5 6 6.5 0.1 0.15 0.2 0.25 MAP Discrepancy between relevant and nonrelevant documents in the training data Ranking SVM

slide-25
SLIDE 25

Conclusions


  • Some
LTR
algorithms
are
robust
to
document


selec,on
methodologies


  • For
those
not,
selec,on
methodology
ma^ers


– Depth‐k
pooling,
stra,fied
sampling


  • Harmful
to
select
too
many
relevant
docs

  • Harmful
to
select
relevant
and
non‐relevant


docs
that
are
too
similar