Entity-oriented Entity-oriented Search Result Diversification A d - - PowerPoint PPT Presentation

entity oriented entity oriented search result
SMART_READER_LITE
LIVE PREVIEW

Entity-oriented Entity-oriented Search Result Diversification A d - - PowerPoint PPT Presentation

Entity-oriented Entity-oriented Search Result Diversification A d Andrey Plakhov Pl kh Yandex Outline Outline Introduction Brief overview Brief overview Optimization target: pFound-IA Implementation details


slide-1
SLIDE 1

Entity-oriented Entity-oriented Search Result Diversification

A d Pl kh Andrey Plakhov Yandex

slide-2
SLIDE 2

Outline Outline

  • Introduction
  • Brief overview

Brief overview

  • Optimization target: pFound-IA
  • Implementation details
  • Implementation details
  • Why entity-oriented?

Mi i t titi d h i t t

  • Mining out entities and search intents
  • Some results
  • Further work

Yandex, 2011

slide-3
SLIDE 3

Аndrey Plakhov «Entity-oriented search result diversification»

A brief overview of “Spectrum” p

  • Part of Yandex ranking dealing with ambiguous queries
  • Part of Yandex ranking dealing with ambiguous queries,

e.g.

  • [moscow state university]

[moscow state university]

  • [pope john paul ii]
  • [jaguar]

[jaguar]

  • Went into production late 2010
  • Reformulation‐driven search results diversification
  • Optimizes for IA‐type diversity metric
  • Greedy algorithm (similar to xQuAD)

Al SERP f 15 20% h i ( 15 l d )

Yandex, 2011

  • Alters SERPs for 15‐20% search queries (~15mln a day)
slide-4
SLIDE 4

Аndrey Plakhov «Entity-oriented search result diversification»

Search results scanning

user behavior model

Start: j=1 Look at j‐th search result

pRelj 1‐pRelj

Answer found Continue scanning?

pContinue 1‐pContinue

Yandex, 2011

Answer not found j:=j+1

slide-5
SLIDE 5

Аndrey Plakhov «Entity-oriented search result diversification»

Search effectiveness metric: pFound p

Si il t ERR (Ch ll t l CICM 09) Similar to ERR (Chapelle et al., CICM 09) Problem: maximum when all N results are “exactly same”

Yandex, 2011

Problem: maximum when all N results are exactly same

slide-6
SLIDE 6

Аndrey Plakhov «Entity-oriented search result diversification»

Search effectiveness metric:

pFound-IA (or wide pFound)

Wi – i-th search intent fraction amongst all query instances pfoundi – prob. that a user with i-th search intent will find an answer (calculated as before) (calculated as before)

Similar to other IA metrics (Agrawal et al., WSDM 09) Problem: maximum when all N results are “exactly same”

Yandex, 2011

New problem: how do we learn search intents & their fractions?

slide-7
SLIDE 7

Аndrey Plakhov «Entity-oriented search result diversification»

Excerpts from query stream p q y

… the old castle camping altai 2 the old castle camping altai 2 the old castle camping astrakhan 1 the old castle camping lake teletskoe 1 the old castle camping saint mountains 1 the old castle camping saint mountains 1 the old castle camping teletskoe 1 the old castle camping teletskoe lake 1 the old castle camping teletskoe address 1 the old castle camping teletskoe address 1 the old castle camping teletskoe phone 1 …

  • At least three different camping sites with the same name
  • One significantly more popular then the others
  • Some people state their search intent explicitly

(“address” and “phone”) but most don’t

Yandex, 2011

( address and phone ), but most don t

slide-8
SLIDE 8

Аndrey Plakhov «Entity-oriented search result diversification»

Excerpts from query stream p q y

… audi a8 4 2 quattro mileage 1 audi a8 4.2 quattro mileage 1 audi a8 4.2 quattro miles per gallon 1 audi a8 4.2 quattro kiev 1 audi a8 4 2 quattro years of production 1 audi a8 4.2 quattro years of production 1 audi a8 4.2 quattro equipment 4 audi a8 4.2 quattro equipment 2003 2 audi a8 4 2 quattro reviews 1 audi a8 4.2 quattro reviews 1 audi a8 4.2 quattro owner review 1 audi a8 4.2 quattro specifications 3 …

  • Looking only at queries one can tell that it’s a car

(or less likely other kind of vehicle) (or, less likely, other kind of vehicle)

  • Search intents have lots of synonymous “spellings”
  • Some search intents are geo-local (e.g. pricing), some

Yandex, 2011

Some search intents are geo local (e.g. pricing), some aren’t (e.g. gas mileage, specifications)

slide-9
SLIDE 9

Аndrey Plakhov «Entity-oriented search result diversification»

So?

Let’s use query expansions to understand user intents and/or different query meanings! (Santos et al., WWW 10) q y g ( , ) But it turns to be not that easy as it seems…

  • Not all expansions are ”intents”
  • Intents differ not only in their probabilities
  • Intents differ not only in their probabilities
  • Several expansions could correspond to same intent
  • Some query classes should be treated in a special way

q y p y

Yandex, 2011

slide-10
SLIDE 10

Outline Outline

  • Introduction
  • Brief overview

Brief overview

  • Optimization target: pFound-IA
  • Implementation details
  • Implementation details
  • Why entity-oriented?

Mi i t titi d h i t t

  • Mining out entities and search intents
  • Some results
  • Further work

Yandex, 2011

slide-11
SLIDE 11

Why entity-oriented?

Аndrey Plakhov «Entity-oriented search result diversification»

Why entity oriented?

[Beijing duck] is an expansion for [Beijing], and a popular query but it’s hardly an “intent” or an “aspect” popular query, but it s hardly an intent or an aspect for [Beijing] We should have some instrument to distinguish between “good” and “bad” expansions

Yandex, 2011

slide-12
SLIDE 12

Why entity-oriented?

Аndrey Plakhov «Entity-oriented search result diversification»

Why entity oriented?

We focus on queries that fall into one of most frequent and important categories from the predefined list: important categories from the predefined list:

  • Movies
  • Books
  • People
  • Gadgets

C

  • Cars
  • Diseases

For an entity in every category we could specify what users could typically think about when issuing a corresponding query, e.g. Cars: compare, reviews, images, info, buy new/used, parts Diseases: symptoms, treatment, epidemiology, textbook …

Yandex, 2011

slide-13
SLIDE 13

Query model

Аndrey Plakhov «Entity-oriented search result diversification»

Query model

We focus on 3 specific query types We focus on 3 specific query types

  • (entity)

(entity) (indicator)

  • (entity) (indicator)
  • (entity) (explicit search intent)

E.g. (adapted from Russian) [el ga chito]

  • [el gauchito]
  • [el gauchito restaurant]

[ l hit i ]

  • [el gauchito reviews]

Yandex, 2011

slide-14
SLIDE 14

Query model

Аndrey Plakhov «Entity-oriented search result diversification»

Query model

Entity query Entity query An ambiguous query that just names some entity without any explicit clues about underlying search without any explicit clues about underlying search

  • intent. Spectrum’s primary target.

E.g.

  • [el gauchito]

[ g ]

  • [bmw x5]
  • [Spiderman chronicles]

[ p ]

  • [beijing]

Yandex, 2011

slide-15
SLIDE 15

Query model

Аndrey Plakhov «Entity-oriented search result diversification»

Query model

Entity+indicator query Entity+indicator query A query that helps us classify an entity it contains. An indicator could explicitly name an entity class An indicator could explicitly name an entity class,

  • r just be a clue

E.g.

  • [el gauchito restaurant]

[ g ]

  • [bmw x5 car dealers]
  • [spiderman the movie]

[ p ]

  • [beijing zip code]

Yandex, 2011

slide-16
SLIDE 16

Query model

Аndrey Plakhov «Entity-oriented search result diversification»

Query model

Entity+intent query Entity+intent query Queries that help us select what intents, possible for a category in principle are really present for a given category in principle, are really present for a given entity, and what intent probabilities should be assigned E.g.

  • [el gauchito reviews] x 18, [el gauchito driving directions] x 5
  • [bmw x5 used] x 314, [bmw x5 reviews] x 2345, …
  • [spiderman 2012 trailer], [spiderman cast], …
  • [beijing weather], [beijing local time], …

Yandex, 2011

slide-17
SLIDE 17

Mining intents and indicators

Аndrey Plakhov «Entity-oriented search result diversification»

Mining intents and indicators

Some expansions are frequent for different objects of one kind

TV channels

schedule schedule

  • nline
  • fficial site
  • nline streaming
  • nline streaming

channel tv schedule russia russia tv news li e live ua com ru

Yandex, 2011

slide-18
SLIDE 18

Mining intents and indicators

Аndrey Plakhov «Entity-oriented search result diversification»

Mining intents and indicators

TV channels indicators:

TV channels intents:

  • tv
  • tv schedule
  • live streaming
  • schedule
  • live streaming
  • russian web site
  • live streaming
  • channel
  • tv channel

russian web site

  • global web site
  • news

Indicators vs intents

  • “tv channel” is an indicator, but not a search intent
  • “schedule” is a search intent, but not an indicator
  • “tv schedule” can play both roles

tv schedule can play both roles

  • “weather” is neither an indicator, nor search intent

Both lists (popular search intents and category indicators) Both lists (popular search intents and category indicators) can be mined in a semi-automated manner

Yandex, 2011

slide-19
SLIDE 19

Entity categorization

Аndrey Plakhov «Entity-oriented search result diversification»

Entity categorization

Machine learning:

  • several methods tested

several methods tested

  • most of them often give inexplicable results
  • max. precision about 90% even after weeks of fine tuning

Semi-automated manner:

  • automatically collect indicator candidates (in a manner similar to TF/IDF)

automatically collect indicator candidates (in a manner similar to TF/IDF)

  • filter them manually, combine into several groups close in value
  • an entity belongs to a category iff it has enough indicator expansions

dditi l i di t b d d di t Wiki di

  • additional indicators can be used depending on category: Wikipedia

categories, predefined lists, specific query terms, clicked sites etc

After experiments we strongly prefer using the second approach, at least for Spectrum + Much higher precision (up to, explainable results Much higher precision (up to, explainable results

  • Needs some manual labor (approx. an hour for one category)

Yandex, 2011

slide-20
SLIDE 20

Universal intents

Аndrey Plakhov «Entity-oriented search result diversification»

Universal intents

Some query expansions should always be considered as explicitly Some query expansions should always be considered as explicitly stated intents, even if we don’t know the entity’s category. E.g.:

  • images/photos
  • user manual/guide (as in [nokia n8 user manual])
  • translation (as in [c’est la vie translation])
  • definition (as in [extrovert definition])
  • abbreviation meaning

(as in [nyc abbreviation meaning])

  • abbreviation meaning (as in [nyc abbreviation meaning])
  • list (as in [7 wonders list])
  • price

(as in [ipad 2 price in europe]) price (as in [ipad 2 price in europe])

  • 2011, 2012 (as in [spiderman 2012])

This list depends on language/country (literal translation isn’t enough), these examples being specific for Russian search queries

Yandex, 2011

slide-21
SLIDE 21

Solving the diversity problem

Аndrey Plakhov «Entity-oriented search result diversification»

Solving the diversity problem

For an ambiguous query we know For an ambiguous query we know

  • what categories does it fall into
  • what search intents valid for those categories we see in
  • what search intents valid for those categories we see in

its expansions, and in what counts

  • what universal search intents are present in its

what universal search intents are present in its expansions, and in what counts Enough info to infer search intents list along with their weights So now we possess all needed info to produce a ranking that maximizes wide pFound (or any other IA-type metric)

Yandex, 2011

slide-22
SLIDE 22

Outline Outline

  • Introduction
  • Brief overview
  • Optimization target: pFound-IA
  • Implementation details

Implementation details

  • Why entity-oriented?
  • Mining out entities and search intents

Mining out entities and search intents

  • Special query and intent types
  • Some results
  • Some results
  • Further work

Yandex, 2011

slide-23
SLIDE 23

Some results Some results

Good news:

1% l b d t l i 1% less abandonment on popular queries CTR for positions 2 to 10 is up 2-5%

Yandex, 2011

slide-24
SLIDE 24

Good news: allowed us to easily implement search intent highlighting in snippets

slide-25
SLIDE 25

Some results Some results

Bad news: Spectrum-oriented Bad news: Spectrum oriented…

Yandex, 2011

slide-26
SLIDE 26

Outline Outline

  • Introduction
  • Introduction
  • Brief overview

O ti i ti t t F d IA

  • Optimization target: pFound-IA
  • Implementation details
  • Why entity-oriented?
  • Mining out entities and search intents

g

  • Special query and intent types
  • Evaluation

Evaluation

  • Further work

Yandex, 2011

slide-27
SLIDE 27

Further work

Аndrey Plakhov «Entity-oriented search result diversification»

Further work

A t t d i i t t d t ti

  • Automated synonymic intents detection
  • «download» and «download for free»
  • «trailer» and «movie trailer online»
  • «trailer» and «movie trailer online»
  • «new york» and «in nyc»
  • Automatically separate explicitly stated intents

from non-intent expansions

  • Parsing query stream as a specific case of

a natural language acquisition problem a natural language acquisition problem

Yandex, 2011

slide-28
SLIDE 28

Аndrey Plakhov «Entity-oriented search result diversification»

Natural query language Natural query language

it’ t R i ( th k l )

  • it’s not Russian (or any other spoken language)

NL b ti till h ld ( Zi f’ l )

  • many NL observations still hold (e.g. Zipf’s law)

l k i d l f t

  • lacks recursion and complex grammar features

t “b f d ” ith

  • not “bag of words” either

Bag of {entity; clarification; search intent}

Yandex, 2011

slide-29
SLIDE 29

Аndrey Plakhov «Entity-oriented search result diversification»

Natural query language Natural query language

Emerges and evolves in a natural way Emerges and evolves in a natural way

  • users learn frequent patterns
  • expressive constructs become more frequent

expressive constructs become more frequent

  • ineffective constructs perish

Makes perfect research target

simple grammar

  • simple grammar
  • full usage statistics

complete corpus with frequencies assigned

  • complete corpus with frequencies assigned
  • it‘s changes over time

now available for industry not for academia

  • now available for industry, not for academia
  • privacy issues

Yandex, 2011

slide-30
SLIDE 30

Аndrey Plakhov «Entity-oriented search result diversification»

Questions? Questions?

*

Yandex, 2011

*Rule 14: slides must contain at least one kitten (and I got two!)