Entity-oriented Entity-oriented Search Result Diversification A d - - PowerPoint PPT Presentation
Entity-oriented Entity-oriented Search Result Diversification A d - - PowerPoint PPT Presentation
Entity-oriented Entity-oriented Search Result Diversification A d Andrey Plakhov Pl kh Yandex Outline Outline Introduction Brief overview Brief overview Optimization target: pFound-IA Implementation details
Outline Outline
- Introduction
- Brief overview
Brief overview
- Optimization target: pFound-IA
- Implementation details
- Implementation details
- Why entity-oriented?
Mi i t titi d h i t t
- Mining out entities and search intents
- Some results
- Further work
Yandex, 2011
Аndrey Plakhov «Entity-oriented search result diversification»
A brief overview of “Spectrum” p
- Part of Yandex ranking dealing with ambiguous queries
- Part of Yandex ranking dealing with ambiguous queries,
e.g.
- [moscow state university]
[moscow state university]
- [pope john paul ii]
- [jaguar]
[jaguar]
- Went into production late 2010
- Reformulation‐driven search results diversification
- Optimizes for IA‐type diversity metric
- Greedy algorithm (similar to xQuAD)
Al SERP f 15 20% h i ( 15 l d )
Yandex, 2011
- Alters SERPs for 15‐20% search queries (~15mln a day)
Аndrey Plakhov «Entity-oriented search result diversification»
Search results scanning
user behavior model
Start: j=1 Look at j‐th search result
pRelj 1‐pRelj
Answer found Continue scanning?
pContinue 1‐pContinue
Yandex, 2011
Answer not found j:=j+1
Аndrey Plakhov «Entity-oriented search result diversification»
Search effectiveness metric: pFound p
Si il t ERR (Ch ll t l CICM 09) Similar to ERR (Chapelle et al., CICM 09) Problem: maximum when all N results are “exactly same”
Yandex, 2011
Problem: maximum when all N results are exactly same
Аndrey Plakhov «Entity-oriented search result diversification»
Search effectiveness metric:
pFound-IA (or wide pFound)
Wi – i-th search intent fraction amongst all query instances pfoundi – prob. that a user with i-th search intent will find an answer (calculated as before) (calculated as before)
Similar to other IA metrics (Agrawal et al., WSDM 09) Problem: maximum when all N results are “exactly same”
Yandex, 2011
New problem: how do we learn search intents & their fractions?
Аndrey Plakhov «Entity-oriented search result diversification»
Excerpts from query stream p q y
… the old castle camping altai 2 the old castle camping altai 2 the old castle camping astrakhan 1 the old castle camping lake teletskoe 1 the old castle camping saint mountains 1 the old castle camping saint mountains 1 the old castle camping teletskoe 1 the old castle camping teletskoe lake 1 the old castle camping teletskoe address 1 the old castle camping teletskoe address 1 the old castle camping teletskoe phone 1 …
- At least three different camping sites with the same name
- One significantly more popular then the others
- Some people state their search intent explicitly
(“address” and “phone”) but most don’t
Yandex, 2011
( address and phone ), but most don t
Аndrey Plakhov «Entity-oriented search result diversification»
Excerpts from query stream p q y
… audi a8 4 2 quattro mileage 1 audi a8 4.2 quattro mileage 1 audi a8 4.2 quattro miles per gallon 1 audi a8 4.2 quattro kiev 1 audi a8 4 2 quattro years of production 1 audi a8 4.2 quattro years of production 1 audi a8 4.2 quattro equipment 4 audi a8 4.2 quattro equipment 2003 2 audi a8 4 2 quattro reviews 1 audi a8 4.2 quattro reviews 1 audi a8 4.2 quattro owner review 1 audi a8 4.2 quattro specifications 3 …
- Looking only at queries one can tell that it’s a car
(or less likely other kind of vehicle) (or, less likely, other kind of vehicle)
- Search intents have lots of synonymous “spellings”
- Some search intents are geo-local (e.g. pricing), some
Yandex, 2011
Some search intents are geo local (e.g. pricing), some aren’t (e.g. gas mileage, specifications)
Аndrey Plakhov «Entity-oriented search result diversification»
So?
Let’s use query expansions to understand user intents and/or different query meanings! (Santos et al., WWW 10) q y g ( , ) But it turns to be not that easy as it seems…
- Not all expansions are ”intents”
- Intents differ not only in their probabilities
- Intents differ not only in their probabilities
- Several expansions could correspond to same intent
- Some query classes should be treated in a special way
q y p y
Yandex, 2011
Outline Outline
- Introduction
- Brief overview
Brief overview
- Optimization target: pFound-IA
- Implementation details
- Implementation details
- Why entity-oriented?
Mi i t titi d h i t t
- Mining out entities and search intents
- Some results
- Further work
Yandex, 2011
Why entity-oriented?
Аndrey Plakhov «Entity-oriented search result diversification»
Why entity oriented?
[Beijing duck] is an expansion for [Beijing], and a popular query but it’s hardly an “intent” or an “aspect” popular query, but it s hardly an intent or an aspect for [Beijing] We should have some instrument to distinguish between “good” and “bad” expansions
Yandex, 2011
Why entity-oriented?
Аndrey Plakhov «Entity-oriented search result diversification»
Why entity oriented?
We focus on queries that fall into one of most frequent and important categories from the predefined list: important categories from the predefined list:
- Movies
- Books
- People
- Gadgets
C
- Cars
- Diseases
- …
For an entity in every category we could specify what users could typically think about when issuing a corresponding query, e.g. Cars: compare, reviews, images, info, buy new/used, parts Diseases: symptoms, treatment, epidemiology, textbook …
Yandex, 2011
Query model
Аndrey Plakhov «Entity-oriented search result diversification»
Query model
We focus on 3 specific query types We focus on 3 specific query types
- (entity)
(entity) (indicator)
- (entity) (indicator)
- (entity) (explicit search intent)
E.g. (adapted from Russian) [el ga chito]
- [el gauchito]
- [el gauchito restaurant]
[ l hit i ]
- [el gauchito reviews]
Yandex, 2011
Query model
Аndrey Plakhov «Entity-oriented search result diversification»
Query model
Entity query Entity query An ambiguous query that just names some entity without any explicit clues about underlying search without any explicit clues about underlying search
- intent. Spectrum’s primary target.
E.g.
- [el gauchito]
[ g ]
- [bmw x5]
- [Spiderman chronicles]
[ p ]
- [beijing]
Yandex, 2011
Query model
Аndrey Plakhov «Entity-oriented search result diversification»
Query model
Entity+indicator query Entity+indicator query A query that helps us classify an entity it contains. An indicator could explicitly name an entity class An indicator could explicitly name an entity class,
- r just be a clue
E.g.
- [el gauchito restaurant]
[ g ]
- [bmw x5 car dealers]
- [spiderman the movie]
[ p ]
- [beijing zip code]
Yandex, 2011
Query model
Аndrey Plakhov «Entity-oriented search result diversification»
Query model
Entity+intent query Entity+intent query Queries that help us select what intents, possible for a category in principle are really present for a given category in principle, are really present for a given entity, and what intent probabilities should be assigned E.g.
- [el gauchito reviews] x 18, [el gauchito driving directions] x 5
- [bmw x5 used] x 314, [bmw x5 reviews] x 2345, …
- [spiderman 2012 trailer], [spiderman cast], …
- [beijing weather], [beijing local time], …
Yandex, 2011
Mining intents and indicators
Аndrey Plakhov «Entity-oriented search result diversification»
Mining intents and indicators
Some expansions are frequent for different objects of one kind
TV channels
schedule schedule
- nline
- fficial site
- nline streaming
- nline streaming
channel tv schedule russia russia tv news li e live ua com ru
Yandex, 2011
Mining intents and indicators
Аndrey Plakhov «Entity-oriented search result diversification»
Mining intents and indicators
TV channels indicators:
TV channels intents:
- tv
- tv schedule
- live streaming
- schedule
- live streaming
- russian web site
- live streaming
- channel
- tv channel
russian web site
- global web site
- news
Indicators vs intents
- “tv channel” is an indicator, but not a search intent
- “schedule” is a search intent, but not an indicator
- “tv schedule” can play both roles
tv schedule can play both roles
- “weather” is neither an indicator, nor search intent
Both lists (popular search intents and category indicators) Both lists (popular search intents and category indicators) can be mined in a semi-automated manner
Yandex, 2011
Entity categorization
Аndrey Plakhov «Entity-oriented search result diversification»
Entity categorization
Machine learning:
- several methods tested
several methods tested
- most of them often give inexplicable results
- max. precision about 90% even after weeks of fine tuning
Semi-automated manner:
- automatically collect indicator candidates (in a manner similar to TF/IDF)
automatically collect indicator candidates (in a manner similar to TF/IDF)
- filter them manually, combine into several groups close in value
- an entity belongs to a category iff it has enough indicator expansions
dditi l i di t b d d di t Wiki di
- additional indicators can be used depending on category: Wikipedia
categories, predefined lists, specific query terms, clicked sites etc
After experiments we strongly prefer using the second approach, at least for Spectrum + Much higher precision (up to, explainable results Much higher precision (up to, explainable results
- Needs some manual labor (approx. an hour for one category)
Yandex, 2011
Universal intents
Аndrey Plakhov «Entity-oriented search result diversification»
Universal intents
Some query expansions should always be considered as explicitly Some query expansions should always be considered as explicitly stated intents, even if we don’t know the entity’s category. E.g.:
- images/photos
- user manual/guide (as in [nokia n8 user manual])
- translation (as in [c’est la vie translation])
- definition (as in [extrovert definition])
- abbreviation meaning
(as in [nyc abbreviation meaning])
- abbreviation meaning (as in [nyc abbreviation meaning])
- list (as in [7 wonders list])
- price
(as in [ipad 2 price in europe]) price (as in [ipad 2 price in europe])
- 2011, 2012 (as in [spiderman 2012])
This list depends on language/country (literal translation isn’t enough), these examples being specific for Russian search queries
Yandex, 2011
Solving the diversity problem
Аndrey Plakhov «Entity-oriented search result diversification»
Solving the diversity problem
For an ambiguous query we know For an ambiguous query we know
- what categories does it fall into
- what search intents valid for those categories we see in
- what search intents valid for those categories we see in
its expansions, and in what counts
- what universal search intents are present in its
what universal search intents are present in its expansions, and in what counts Enough info to infer search intents list along with their weights So now we possess all needed info to produce a ranking that maximizes wide pFound (or any other IA-type metric)
Yandex, 2011
Outline Outline
- Introduction
- Brief overview
- Optimization target: pFound-IA
- Implementation details
Implementation details
- Why entity-oriented?
- Mining out entities and search intents
Mining out entities and search intents
- Special query and intent types
- Some results
- Some results
- Further work
Yandex, 2011
Some results Some results
Good news:
1% l b d t l i 1% less abandonment on popular queries CTR for positions 2 to 10 is up 2-5%
Yandex, 2011
Good news: allowed us to easily implement search intent highlighting in snippets
Some results Some results
Bad news: Spectrum-oriented Bad news: Spectrum oriented…
Yandex, 2011
Outline Outline
- Introduction
- Introduction
- Brief overview
O ti i ti t t F d IA
- Optimization target: pFound-IA
- Implementation details
- Why entity-oriented?
- Mining out entities and search intents
g
- Special query and intent types
- Evaluation
Evaluation
- Further work
Yandex, 2011
Further work
Аndrey Plakhov «Entity-oriented search result diversification»
Further work
A t t d i i t t d t ti
- Automated synonymic intents detection
- «download» and «download for free»
- «trailer» and «movie trailer online»
- «trailer» and «movie trailer online»
- «new york» and «in nyc»
- Automatically separate explicitly stated intents
from non-intent expansions
- Parsing query stream as a specific case of
a natural language acquisition problem a natural language acquisition problem
Yandex, 2011
Аndrey Plakhov «Entity-oriented search result diversification»
Natural query language Natural query language
it’ t R i ( th k l )
- it’s not Russian (or any other spoken language)
NL b ti till h ld ( Zi f’ l )
- many NL observations still hold (e.g. Zipf’s law)
l k i d l f t
- lacks recursion and complex grammar features
t “b f d ” ith
- not “bag of words” either
Bag of {entity; clarification; search intent}
Yandex, 2011
Аndrey Plakhov «Entity-oriented search result diversification»
Natural query language Natural query language
Emerges and evolves in a natural way Emerges and evolves in a natural way
- users learn frequent patterns
- expressive constructs become more frequent
expressive constructs become more frequent
- ineffective constructs perish
Makes perfect research target
simple grammar
- simple grammar
- full usage statistics
complete corpus with frequencies assigned
- complete corpus with frequencies assigned
- it‘s changes over time
now available for industry not for academia
- now available for industry, not for academia
- privacy issues
Yandex, 2011
Аndrey Plakhov «Entity-oriented search result diversification»
Questions? Questions?
*
Yandex, 2011
*Rule 14: slides must contain at least one kitten (and I got two!)