Searching with Context
Reiner Kraft Farzin Maghoul Chi Chao Chang Ravi Kumar
Yahoo!, Inc., Sunnyvale, CA 94089, USA
Searching with Context Reiner Kraft Farzin Maghoul Chi Chao Chang - - PowerPoint PPT Presentation
Searching with Context Reiner Kraft Farzin Maghoul Chi Chao Chang Ravi Kumar Yahoo!, Inc., Sunnyvale, CA 94089, USA Agenda Motivation Contextual Search Introduction Case Study: Y!Q Algorithms Query Rewriting
Reiner Kraft Farzin Maghoul Chi Chao Chang Ravi Kumar
Yahoo!, Inc., Sunnyvale, CA 94089, USA
2
Yahoo! Confidential
3
Yahoo! Confidential
– Not too much qualitative differences between search results of major search engines – Introducing anchor text and link analysis to improve search relevancy last major significant feature (1998)
precise our results can be
query box from which we can infer better knowledge of information need
manual form of contextual search by using additional terms to refine and reissue queries when the search results for the initial query turn out to be unsatisfactory
refining, and improving a user’s search query to obtain more relevant results?
4
Yahoo! Confidential
5
Yahoo! Confidential
– In general: Any additional information associated with a query – More narrow: A piece of text (e.g., a few words, a sentence, a paragraph, an article) that has been authored by someone
– Dense representation of a context in the vector space model – Obtained using keyword extraction algorithms (e.g., Wen-tau Yih et al., KEA, Y! Content Analysis)
– Simple: Few keywords, no special or expensive operators – Complex: Keywords/phrases plus special ranking operators, more expensive to evaluate – Contextual: Query + context term vector
– Standard: Web search engines (e.g., Yahoo, Google, MSN, …) that support simple queries – Modified: A Web search engine that has been modified to support complex search queries
6
Yahoo! Confidential
– Y!Q provides a simple API that allows publishers to associate visual information widgets (actuators) to parts of page content (http://yq.search.yahoo.com/publisher/embed.html) – Y!Q lets users manually specify or select context (e.g., within Y! Toolbar, Y! Messenger, included JavaScript library)
– Generates a digest (context term vector) of the associated content piece as additional terms of interest for augmenting queries (content analysis) – Knows how to perform contextual searches for different search back-end providers (query rewriting framework) – Knows how to rank results based on query + context (contextual ranking) – Seamless integration by displaying results in overlay or embedded within page without interrupting the user’s workflow
7
Yahoo! Confidential
8
Yahoo! Confidential
9
Yahoo! Confidential
10
Yahoo! Confidential
11
Yahoo! Confidential
Terms extracted from context
12
Yahoo! Confidential
13
Yahoo! Confidential
– We have a query plus a context term vector (contextual search query)
– Number of queries to send to a search engine per contextual search query – Types of queries to send
– Query Rewriting (QR) – Rank-Biasing (RB) – Iterative, Filtering, Meta-Search (IFM)
14
Yahoo! Confidential
15
Yahoo! Confidential
– Query, context term vector – Number of terms to consider from context term vector
– QR1 (takes top term only) – QR2 (takes top two terms only) – … up to QR5
– QR3: Given query q and => q AND a AND b AND c
– Simplicity, supported in all major search engines
– Possibly low recall for longer queries
16
Yahoo! Confidential
– Selection part – Optional ranking terms are only impacting score of selected documents
– Query, context term vector – Number of selection terms to consider (conjunctive semantics) – Number of RANK operators – Weight multiplier for each RANK operator (used for scaling)
– RB2 (uses 1 selection term, 2 RANK operators, weight multiplier=0.1) – RB6 (uses 2 selection terms, 6 RANK operators, weight multiplier=0.01)
– RB2: Given q and => q AND a RANK(b, 2.5) RANK(c, 1.2)
– Ranking terms do not limit recall
– Requires a modified search engine back-end, more expensive to evaluate
a,50 b,25 c,12 ⎛ ⎝ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟
17
Yahoo! Confidential
Finder [kraft, stata, 2003])
– Sends multiple (simple) queries to possibly multiple search engines – Combines results using rank aggregation methodologies
18
Yahoo! Confidential
– Query templates specify how sub-queries get constructed from the pool of candidate terms – Allow to explore the problem domain in a systematic way – Implemented primarily sliding window technique using query templates – Example: Given query q and => a sliding window query template of size 2 may construct the following queries:
– Size of the sliding window
– IFM-SW1, IFM-SW2, IFM-SW3, IFM-SW4
19
Yahoo! Confidential
– Combine k lists into π*, such that is minimized – For d(.,.) we used various distance functions (e.g,. Spearman footrule, Kendall tau)
– Style of rank aggregation:
– IFM-RA, IFM-MC4
i=1 k
20
Yahoo! Confidential
– 200 contexts sampled from Y!Q query logs
– 15 QR (Yahoo, MSN, Google) – 18 RB (1 or 2 selection terms; 2, 4, or 6 RANK operators, 0.01, 0.1,
– 8 IFM (avg and MC4 on Yahoo, SW1 to SW4)
– Relevancy to the context, perceived relevancy used – Relevancy Judgments:
– 28 expert judges, look at top 3 results, total of 24,556 judgments
21
Yahoo! Confidential
– “Cowboys Cut Carter; Testaverde to Start OXNARD, Calif Quincy Carter was cut by the Dallas Cowboys on Wednesday, leaving 40-year-old Vinny Testaverde as the starting quarterback. The team would’nt say why it released Carter.”
– A result directly relating to the “Dallas Coyboys” (football team) or Quincy Carter => Yes – A result repeating the same or similar information => Somewhat – A result about Jimmy Carter, the former U.S. president => No – If result doesn’t provide sufficient information => Can’t tell
22
Yahoo! Confidential
– Number of relevant results divided by the number of retrieved results, but capped at 1 or 3, and expressed as a ratio – A result is considered relevant if and only if it receives a ‘Y’ relevant judgment
– Number of relevant results divided by the number of retrieved results, but capped at 1 or 3, and expressed as a ratio – A result is considered relevant if and only if it receives a ‘Y’
23
Yahoo! Confidential
– Substantial drop in recall as number of vector entries in QR increases (expected), comparable between MSN, Yahoo, roughly one order of magnitude less on Google – For QR4 using MSN, Yahoo, low recall may potentially affect user experience – RB configurations tested same recall as QR2 – IFM works on substantially larger set of candidate results
Coverage Drop ( = 0 )
1 2 3 4 5 6 7 8 9 10 Percentage of Contexts MSN (0) Yahoo (0) Google (0) MSN (0) 1 6 9 Yahoo (0) 1 6 9 Google (0) 3 4 5 QR1 QR2 QR3 QR4 QR5
Coverage Drop ( < 3 )
5 10 15 20 25 30 Percentage of Contexts MSN (< 3) Yahoo (< 3) Google (< 3) MSN (< 3) 1 11 21 28 Yahoo (< 3) 3 11 20 26 Google (< 3) 4 7 12 QR1 QR2 QR3 QR4 QR5
24
Yahoo! Confidential
– Use P@1, P@3, SP@1, SP@3 metrics – SP drops sharply for MSN, Yahoo beyond QR4 (recall issues) – Optimal operating point for MSN, Yahoo QR3/QR4, Google QR5 – QR4 uses 7.3 terms avg., QR5 uses 8.51 terms avg.
0.000 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450
Strong Precision @ 3
MSN Yahoo Google MSN 0.250 0.364 0.390 0.396 0.358 Yahoo 0.250 0.375 0.397 0.416 0.394 Google 0.254 0.384 0.395 0.394 0.404 QR1 QR2 QR3 QR4 QR5 0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900
Precision @ 3
MSN Yahoo Google MSN 0.504 0.687 0.770 0.775 0.757 Yahoo 0.496 0.688 0.758 0.801 0.780 Google 0.489 0.717 0.784 0.801 0.802 QR1 QR2 QR3 QR4 QR5
25
Yahoo! Confidential
– RB2/RB6 best configurations within RBs, RB2 has highest SP@1 – IFM-RA-SW3 winner (best P@1)
0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900
RB/ I FM Precision
P@1 P@3 P@1 0.803 0.755 0.524 0.803 0.887 0.855 0.503 0.797 0.870 0.845 P@3 0.742 0.684 0.502 0.730 0.794 0.785 0.497 0.721 0.787 0.762 RB2 RB6 IFM RA SW1 IFM RA SW2 IFM RA SW3 IFM RA SW4 IFM MC4 SW1 IFM MC4 SW2 IFM MC4 SW3 IFM MC4 SW4
26
Yahoo! Confidential
– However, precision decreases as function of low recall – Optimal setting depends on web search engine
as that of QR (best QR1 issues 2.25 terms attains P@3 of 0.504)
– particularly at SP@1 – Additional experiments showed that some good results are bubbling up from middle-tier of results (ranked between positions 100 and 1000) – Does not do well for SP@3 (problem if the “right” results are not recalled by selection part) – Requires substantial modifications to a web search engine
– achieves highest recall and overall relevancy – Can be competitive and, in some measures, superior to QR – More costly to execute
27
Yahoo! Confidential
search:
– QR – RB – IFM
– can be easily implemented on top of a commodity search engine – Performs surprisingly well – Likely to be superior to manual query reformulation – Recall problems
– Outperforms both QR and RB in terms of recall and precision
contextual search implementers
28
Yahoo! Confidential
29
Yahoo! Confidential Interested? Email your resume to: thinkbig@yahoo-inc.com
30
Yahoo! Confidential
31
Yahoo! Confidential
– Context: location – Query: “movie theater”
wants to learn more about it
– Context: news article – Query: review
– Context: search history, user preferences – Query: java