Use of Click Data for Web Search
Tao Yang UCSB 290N
Use of Click Data for Web Search Tao Yang UCSB 290N Table of - - PowerPoint PPT Presentation
Use of Click Data for Web Search Tao Yang UCSB 290N Table of Content Search Engine Logs Eyetracking data on position bias Click data for ranker training [Joachims, KDD02] Case study: Use of click data for search ranking [
Tao Yang UCSB 290N
3
4
5
mustang ford mustang
…
www.fordvehicles.com/ cars/mustang www.mustang.com en.wikipedia.org/wiki/ Ford_Mustang AlsoTry
Query session
6
Session
Mission Mission Mission
Query Query Query Click Click Click Query Query Click Click
fixation fixation fixation
Query level Click level Eye-tracking level
2/23/2015 8
CIKM'09 Tutorial, Hong Kong, China # of clicks received
equally “good”?
excuses:
2/23/2015
CIKM'09 Tutorial, Hong Kong, China
9
2/23/2015 10
CIKM'09 Tutorial, Hong Kong, China # of clicks received
2/23/2015 11
CIKM'09 Tutorial, Hong Kong, China # of clicks received
12 2/23/2015
CIKM'09 Tutorial, Hong Kong, China
Google user patterns
Higher positions receive more user attention (eye fixation) and clicks than lower positions. This is true even in the extreme setting where the order of positions is reversed. “Clicks are informative but biased”.
14 2/23/2015
CIKM'09 Tutorial, Hong Kong, China
[Joachims+07]
Normal Position Percentage Reversed Impression Percentage
2/23/2015
CIKM'09 Tutorial, Hong Kong, China
15
Preference pairs:
Use Rank SVM to optimize
Limitation:
modeling 1 2 3 4 5 6 7 8
17
Web Search Ranking by Incorporating User Behavior Information Rank pages relevant for a query
2006
Ranking
– e.g., page terms, anchor text, term weights, term span
– e.g., web topology, spam features
18
for each query and result pair
behavior for the query
19
Presentation ResultPosition Position of the URL in Current ranking QueryTitleOverlap Fraction of query terms in result Title Clickthrough DeliberationTime Seconds between query and first click ClickFrequency Fraction of all clicks landing on page ClickDeviation Deviation from expected click frequency Browsing DwellTime Result page dwell time DwellTimeDeviation Deviation from expected dwell time for query
23
query
incorporate user behavior features with other categories of ranking (e.g. text matching)
24
precision at K values computed after each relevant document was retrieved
K j j r q q
1 ) (
25
26
– Rerank these queries with sufficient historic click data
Rerank-All
– User behavior features + content match
27
0.38 0.43 0.48 0.53 0.58 0.63 1 3 5 10
K
Precision
BM25 Rerank-CT Rerank-All BM25+All
28
0.5 0.52 0.54 0.56 0.58 0.6 0.62 0.64 0.66 0.68 1 2 3 4 5 6 7 8 9 10
K
NDCG
BM25 Rerank-CT Rerank-All BM25+All
29
50 100 150 200 250 300 350 0.1 0.2 0.3 0.4 0.5 0.6
0.05 0.1 0.15 0.2 Frequency Average Gain
Most gains are for queries with poor ranking
30
ranking dramatically improves relevance
the most effective strategy
queries
31
MAP Gain RN 0.270 RN+ALL 0.321 0.052 (19.13%) BM25 0.236 BM25+ALL 0.292 0.056 (23.71%)
0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 1 2 3 4 5 6 7 8 9 10
K NDCG
RN Rerank-All RN+All