document selec on methodologies for efficient and effec
play

DocumentSelec,onMethodologies forEfficientandEffec,ve - PowerPoint PPT Presentation

DocumentSelec,onMethodologies forEfficientandEffec,ve LearningtoRank JavedAslam,EvangelosKanoulas, VirgilPavlu,StefanSavev,EmineYilmaz SearchEngines Users Request


  1. Document
Selec,on
Methodologies
 for
Efficient
and
Effec,ve

 Learning‐to‐Rank
 Javed
Aslam,
Evangelos
Kanoulas,

 Virgil
Pavlu,
Stefan
Savev,
Emine
Yilmaz 


  2. Search
Engines
 User’s Request Results Search Engine BM25,I*idf,
 PageRank,
…
 Document Corpus Hundreds
of
 features


  3. Training
Search
Engines
 Queries Search Engine Metric Judges BM25,I*idf,
 PageRank,
…
 Document Corpus 1.
Neural
Network
 2.
Support
Vector
 Machine
 3.
Regression
Func,on
 4.
Decision
Tree
 …


  4. Training
Data
Sets
 • Data
Collec,ons
 – Billions
of
documents
 – Thousands
of
queries
 • Ideal,
in
theory;
infeasible,
in
prac,ce…
 – Extract
features
from
all
query‐document
pairs
 – Judge
each
document
with
respect
to
each
query
 • Extensive
human
effort
 – Train
over
all
query‐document
pairs


  5. Training
Data
Sets
 • Train
the
ranking
func,on
over
a
subset
of
the
 complete
collec,on
 • Few
queries
with
many
document
judged
vs.
 many
queries
with
few
documents
judged
 – Be^er
to
train
over
many
queries
with
few
judged
 documents
[Yilmaz
and
Robertson
’09]
 • How
should
we
select
document?


  6. Training
Data
Sets
 • Machine
Learning
(Ac,ve
Learning)
 – Itera,ve
process
 – Tightly
coupled
with
the
learning
algorithm
 • IR
Evalua,on
 – Many
test
collec,ons
already
available
 – Efficient
and
effec,ve
techniques
to
construct
test
 collec,ons
 • Intelligent
way
of
selec,ng
documents
 • Inferences
of
effec,veness
metrics


  7. Duality
between
LTR
and
Evalua,on
 • This
work:
Explore
duality
between
Evalua,on
 and
Learning‐to‐Rank
 – Employ
techniques
used
for
efficient
and
effec,ve
 test
collec,on
construc,on
to
construct
training
 collec,ons


  8. Duality
between
LTR
and
Evalua,on
 • Can
test
collec,on
construc,on
 methodologies
be
used
to
construct
training
 collec,ons?
 • If
yes,
which
one
of
these
methodologies
is
 be^er?
 • What
makes
a
training
set
be^er
than
the
 other?


  9. Methodology
 • Depth‐100
pool
(as
the
complete
collec,on)
 • Select
subsets
of
documents
from
the
depth‐100
pool
 – Using
different
document
selec,on
methodologies
 • Train
over
the
different
training
sets
 – Using
a
number
of
learning‐to‐rank
algorithms
 • Test
the
performance
of
the
resul,ng
ranking
func,ons
 – Five
fold
cross
valida,on


  10. Data
Sets
 • Data
from
TREC
6,7
and
8
 – Document
corpus
:
TREC
Discs
4
and
5
 – Queries
:
150
queries;
ad‐hoc
tracks
 – Relevance
judgments
:
depth‐100
pools
 • Features
from
each
query‐document
pair
 – 22
features;
subset
of
LETOR
features
 
(BM25,
Language
Models,
TF‐IDF,
…)


  11. Document
Selec,on
Methodologies
 Select
subsets
of
documents

 • Subset
size
varying
from
6%
to
60%
 – 1. Depth‐k
pooling
 2. InfAP
(uniform
random
sampling)
 3. StatAP
(stra,fied
random
sampling)
 4. MTC
(greedy
on‐line
algorithm)
 5. LETOR
(top‐k
by
BM25;
current
prac,ce)
 6. Hedge
(greedy
on‐line
algorithm)


  12. Document
Selec,on
Methodologies
 Discrepancy between relevant and non � relevant documents Precision of the selection methods 6.5 0.8 (symmetrized KL divergence) 6 0.6 Discrepancy Precision 5.5 0.4 depth depth 5 hedge hedge infAP infAP 0.2 mtc mtc 4.5 statAP statAP LETOR LETOR 0 4 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 Percentage of data used for training Percentage of data used for training • Precision
:
frac,on
of
selected
documents
that
are
 relevant
 • Discrepancy
:
symmetrized
KL
divergence
between
 documents’
language
models


  13. LTR
Algorithms
 • Train
over
the
different
data
sets
 1. Regression
(classifica,on
error)
 2. Ranking
SVM
(AUC)
 3. RankBoost
(pairwise
preferences)
 4. RankNet
(probability
of
correct
order)
 5. LambdaRank
(nDCG)


  14. Results
(1)
 Regression Ranking SVM 0.25 0.25 0.25 0.25 MAP MAP 0.2 0.2 0.2 depth depth 0.2 hedge hedge infAP infAP 0.15 0.15 0.15 0.15 MTC MTC statAP statAP 0.1 0.1 0 5 10 0 5 10 LETOR LETOR 0.1 0.1 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 Percentage of data used for training Percentage of data used for training

  15. Results
(2)
 RankBoost LambdaRank 0.25 0.25 0.25 0.2 0.25 MAP 0.2 MAP 0.2 depth depth 0.2 hedge hedge 0.15 0.15 infAP infAP 0.15 0.15 MTC MTC statAP statAP 0.1 0.1 0 5 10 0 5 10 LETOR LETOR 0.1 0.1 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 Percentage of data used for training Percentage of data used for training

  16. Results
(3)
 RankNet RankNet (hidden layer) 0.25 0.25 0.25 0.25 MAP MAP 0.2 0.2 depth 0.2 0.2 depth hedge hedge infAP infAP 0.15 0.15 0.15 0.15 MTC MTC statAP statAP 0.1 0.1 LETOR 0 5 10 0 5 10 LETOR 0.1 0.1 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 Percentage of data used for training Percentage of data used for training

  17. Observa,ons
(1)
 • Some
Learning‐to‐Rank
algorithms
are
robust
 to
document
selec,on
methodologies
 – LambdaRank
vs.
RankBoost
 LambdaRank RankBoost 0.25 0.25 0.25 0.2 0.25 0.2 MAP MAP 0.2 depth depth 0.2 hedge hedge 0.15 0.15 infAP infAP 0.15 0.15 MTC MTC statAP statAP 0.1 0.1 0 5 10 0 5 10 LETOR LETOR 0.1 0.1 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 Percentage of data used for training Percentage of data used for training

  18. Observa,ons
(2)
 • Near‐op,mal
performance
with
1%‐2%
of
 complete
collec,on
(depth‐100
pool)
 – No
significant
differences
at
greater
%
(t‐test)
 – Number
of
features
ma^er
[Taylor
et.al
‘06]

 RankNet 0.25 0.25 MAP 0.2 0.2 depth hedge infAP 0.15 0.15 MTC statAP 0.1 0 5 10 LETOR 0.1 0 10 20 30 40 50 60 70 Percentage of data used for training

  19. Observa,ons
(3)
 • Selec,on
methodology
ma^ers
 – Hedge
(worst
performance)
 – Depth‐k
pooling
and
statAP
(best
performance)
 – LETOR‐like
 (neither
most
efficient
nor
most
effec,ve)
 Ranking SVM 0.25 0.25 MAP 0.2 0.2 depth hedge infAP 0.15 0.15 MTC statAP 0.1 0 5 10 LETOR 0.1 0 10 20 30 40 50 60 70 Percentage of data used for training

  20. Rela,ve
Importance
on
Effec,veness
 • Learning‐to‐Rank
algorithm
vs.
document
 selec,on
methodology
 – 2‐way
ANOVA
model
 • Variance
decomposi,on
over
all
data
sets
 – 26%
due
to
document
selec,on
 – 31%
due
to
LTR
algorithm
 • Variance
decomposi,on
(small
data
sets,
<10%)
 – 44%
due
to
document
selec,on
 – 31%
due
to
LTR
algorithm


  21. What
makes
one
training
set
be^er
 than
another?
 • Different
methods
have
different
proper,es
 – Precision
 – Recall
 – Similari,es
between
relevant
documents
 – Similari,es
between
relevant
and
non‐relevant
 documents
 – ...
 • Model
selec,on


  22. What
makes
one
training
set
be^er
 than
another?
 • Different
methods
have
different
proper,es
 – Precision
 – Recall
 – Similari,es
between
relevant
documents
 – Similari,es
between
relevant
and
non‐relevant
 documents
 – ...
 • Model
selec,on
 – Linear
model
(adjusted
R 2 
=
0.99)


  23. What
makes
one
training
set
be^er
 than
another?
 RankBoost Ranking SVM 0.25 0.25 MAP MAP 0.2 0.2 0.15 0.15 0.1 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Precision Precision

  24. What
makes
one
training
set
be^er
 than
another?
 RankBoost Ranking SVM 0.25 0.25 MAP MAP 0.2 0.2 0.15 0.15 0.1 0.1 4 4.5 5 5.5 6 6.5 4 4.5 5 5.5 6 6.5 Discrepancy between relevant and non � relevant Discrepancy between relevant and non � relevant documents in the training data documents in the training data

  25. Conclusions
 • Some
LTR
algorithms
are
robust
to
document
 selec,on
methodologies
 • For
those
not,
selec,on
methodology
ma^ers
 – Depth‐k
pooling,
stra,fied
sampling
 • Harmful
to
select
too
many
relevant
docs
 • Harmful
to
select
relevant
and
non‐relevant
 docs
that
are
too
similar


Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend