improving recommendation for long-tail queries via templates Idan - - PowerPoint PPT Presentation

improving recommendation for long tail queries via
SMART_READER_LITE
LIVE PREVIEW

improving recommendation for long-tail queries via templates Idan - - PowerPoint PPT Presentation

improving recommendation for long-tail queries via templates Idan Szpektor Aristides Gionis Yoelle Maarek Yahoo! Research, Haifa and Barcelona query templates www 2011 motivation goal: improve coverage of query-recommendation systems most


slide-1
SLIDE 1

improving recommendation for long-tail queries via templates

Idan Szpektor Aristides Gionis Yoelle Maarek Yahoo! Research, Haifa and Barcelona

query templates www 2011

slide-2
SLIDE 2

motivation

goal: improve coverage of query-recommendation systems most query-recommendation systems are based on finding queries that co-occur frequently

  • bservation: in a typical query log 50 % of query volume

are unique queries [Baeza-Yates et al., 2007] inherent limitation on using co-occurrences need to be able to develop methods to reason for rare, and even previously unseen, queries

query templates www 2011

slide-3
SLIDE 3
  • verview of the approach

1

generate candidate query-templates for each query

Paris hotels → <city> hotels Paris hotels → <district> hotels Hyderabad hotels → <city> hotels

2

infer transitions between templates

<city> hotels → <city> restaurants

3

infer recommendations for rare queries

Yancheng hotels → Yancheng restaurants

highlight result: about 100% recall increase (top-10 recommendations)

query templates www 2011

slide-4
SLIDE 4

roadmap

query-flow graph query-template flow graph generating recommendations experimental evaluation

query templates www 2011

slide-5
SLIDE 5

the query-flow graph

[Boldi et al., 2008] take into account temporal information captures the “flow” of how users submit queries definition:

nodes V = Q ∪ {s, t} the distinct set of queries Q, plus a starting state s and a terminal state t edges E ⊆ V × V weights w(q, q′) representing the probability that q and q′ are part of the same chain

query templates www 2011

slide-6
SLIDE 6

the query-flow graph

barcelona fc <T> 0.506 0.043 barcelona fc fixtures 0.031 real madrid 0.017 barcelona weather 0.523 barcelona hotels 0.018 0.100 barcelona 0.018 0.011 0.439 cheap barcelona hotels 0.072 luxury barcelona hotels 0.029 0.080 0.416 0.043 0.023

query templates www 2011

slide-7
SLIDE 7

recommendations using the query-flow graph

[Boldi et al., 2008] for a given query, follow edges in the query-flow graph follow the highest probability edges build graph from same-session queries

query templates www 2011

slide-8
SLIDE 8

roadmap

query-flow graph query-template flow graph generating recommendations experimental evaluation

query templates www 2011

slide-9
SLIDE 9

query templates

defined over a hierarchy of entity types define a global set of templates over the whole query log do not restrict on specific domains (such as, travel, weather, or movies) examples:

jaguar spare parts → <car> spare parts name for salt → name for <compound> a thousand miles notes → <song> notes

query templates www 2011

slide-10
SLIDE 10

candidate templates – example

chocolate cookie chocolate cookie food dessert drink recipe instruction substance

query: chocolate cookie recipe candidate templates: <food> cookie recipe <drink> cookie recipe <food> recipe <substance> recipe chocolate cookie <instruction> . . .

query templates www 2011

slide-11
SLIDE 11

ranking candidate templates

ambiguity Jaguar spare parts → <car> spare parts Jaguar spare parts → <animal> spare parts focus name for salt → name for <compound> name for salt → <description> for salt right generalization level Paris hotels → <capital> hotels Paris hotels → <city> hotels Paris hotels → <location> hotels

query templates www 2011

slide-12
SLIDE 12

construction of query templates – details

queries are tokenized, and n-grams are looked up and mapped to entities in the hierarchy hierarchy used: WordNet 3.0 hierarchy and Wikipedia category hierarchy, connected via yago mapping more than 1.7 million entities more than 4.4 million generalizations enriched with heuristic generalizations for <email>, <url>, numbers, and noun-phrases not in the taxonomy

query templates www 2011

slide-13
SLIDE 13

query-to-template edges

mapping from a query q to its set of templates T(q) viewed as query-to-template edges associated edge scores sqt(q, t) = αd when t obtained by generalizing q at distance d in H parameter α set experimentally to 0.9 set sqt(q, q′) = 1, if (q, q′) edge in query-flow graph normalize so that all sqt(q, ·) sum to 1

query templates www 2011

slide-14
SLIDE 14

template-to-templates edges

reasoning about transitions between templates <food> recipe → healthy <food> recipe for templates (t1, t2) define the support set of query pairs {(q1, q2)}, s.t.

t1 ∈ T(q1) and t2 ∈ T(q2) t1 and t2 substitute the same token in q1 and q2

(e.g., dosa recipe and healthy dosa recipe) define template-to-template edge score as stt(t1, t2) =

  • (q1,q2)∈Sup(t1,t2)

sqt(q, q′)(q1, q2) normalize so that all stt(t, ·) sum to 1

query templates www 2011

slide-15
SLIDE 15

the query-template flow graph

extension of the query-flow graph superposition of all the concepts we have seen so far: set of nodes consists of queries and templates set of edges consists of

query to query edges query to template edges template to template edges

associated weights

query templates www 2011

slide-16
SLIDE 16

roadmap

query-flow graph query-template flow graph generating recommendations experimental evaluation

query templates www 2011

slide-17
SLIDE 17

generating recommendations

q q q′ q′ t1 t2 t3 t4 s1 s2 s3 s4 s5 s6 s7

r(q, q′) = s1s4 + s2s5 + s3s6 + s3s7 interpretation: probability of a feasible path dashed lines do not really exist, but discovered on-the-fly queries q and q′ may not have been seen before transitions in the query-flow graph ranked first

query templates www 2011

slide-18
SLIDE 18

example – ambiguity

consider query transition: jaguar transmission → jaguar spare parts template transition <car> transmission → <car> spare parts supported by bmw transmission → bmw spare parts audi transmission → audi spare parts . . . template transition <animal> transmission → <animal> spare parts will not be supported by lion transmission → lion spare parts tiger transmission → tiger spare parts . . .

query templates www 2011

slide-19
SLIDE 19

roadmap

query-flow graph query-template flow graph generating recommendations experimental evaluation

query templates www 2011

slide-20
SLIDE 20

methodology

methods:

query-template flow graph query-flow graph

evaluation

inspection a sample of the results editorial evaluation automated evaluation

built model on training data and evaluated on testing data

query templates www 2011

slide-21
SLIDE 21

training dataset

queries templates # nodes 95,279,132 5,382,051,983 # edges 83,513,590 4,345,497,267 avg in/out degree 0.88 0.81 max out-degree 14,145 34,249 (craigslist) (<album>) max in-degree 14,317 133,874 (youtube) (<institution>)

query templates www 2011

slide-22
SLIDE 22

anecdotal evidence

{“guangzhou flights”, “guangzhou map”} <capital> flights → <capital> map {“a thousand miles notes”, “a thousand miles piano notes”} <single> notes → <single> piano notes {“8 week old weimaraner”, “8 week old weimaraner puppy”} 8 week old <breed> → 8 week old <breed> puppy {“aaa office twin falls idaho”, “aaa twin falls idaho”} aaa office <city> → aaa <city> {“air force titles”, “air force ranks”} <military service> titles → <military service> ranks {“name for salt”, “chemical name for salt”} name for <compound> → chemical name for <compound>

query templates www 2011

slide-23
SLIDE 23

editorial evaluation

set-A: 300 pairs from each configuration, recommendation in the top-10 set-B: 100 pairs, same queries in each configuration, same position set-C: 100 pairs for which query-flow graph has no recommendation editors labeled query-recommendation pairs as: relevant, not relevant, cannot tell two editors, 100 common queries, kappa-statistic 0.37 qfg qtfg set-A 98.48% 97.84% set-B 97.65% 98.86% set-C — 94.38%

query templates www 2011

slide-24
SLIDE 24

automated evaluation – guiding principle

extract query pairs {qi, qi+1} from a testing dataset, such that user submitted qi+1 after qi in the same session measure if qi+1 is predicted by our methods, and in which position assumption: qi+1 should be relevant and useful for qi

query templates www 2011

slide-25
SLIDE 25

benefits of automated evaluation

large-scale no hard labor by humans, fast, no disagreement problems captures recall — how many pairs can cover be covered

query templates www 2011

slide-26
SLIDE 26

testing dataset

all-pairs: extracted all pairs of queries {qi, qi+1} within the same session — 3.1 million first-last: extracted pairs of the first and the last queries within the same session — 4.6 million editors evaluated a sample of 100 of those pairs: accuracy 100%

query templates www 2011

slide-27
SLIDE 27

results

qfg qtfg relative increase pair occurrences total pairs 3134388 3134388 coverage 22.65 % 28.17 % 24.37 % # in top-100 16.97 % 25.49 % 50.23 % # in top-10 9.49 % 20.74 % 118.49 % # in top-1 2.86 % 10.01 % 249.5 % MAP 0.050 0.137

  • avg. position

18.35 8.3 unique pairs total pairs 2755922 2755922 coverage 13.28 % 19.38 % 45.87 % # in top-100 12.06 % 17.25 % 42.96 % # in top-10 8.41 % 13.52 % 60.68 % # in top-1 2.86 % 6.5 % 127.32 % MAP 0.047 0.089

  • avg. position

12.33 9.43

query templates www 2011

slide-28
SLIDE 28

results

2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 # test-pairs at top-10 (%) query length (words) QFG QTFG

query templates www 2011

slide-29
SLIDE 29

results

5 10 15 20 25 30 35 10 20 30 40 50 # test-pairs at top-10 (%) query frequency QFG QTFG

query templates www 2011

slide-30
SLIDE 30

conclusions

addressed the problem of improving coverage of query recommendation systems mapping queries to templates and learning transitions between templates method can make recommendations for rare or previously unseen queries well suited for tail queries complements rather than replaces existing methods automated evaluation using millions of queries future work: improve quality of extracted templates

query templates www 2011

slide-31
SLIDE 31

thank you!

query templates www 2011

slide-32
SLIDE 32

references I

Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., and Silvestri, F. (2007). The impact of caching on search engines. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR). Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., and Vigna, S. (2008). The query-flow graph: model and applications. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM).

query templates www 2011