Query reformulation model and patterns from dango to japanese - - PowerPoint PPT Presentation

query reformulation model and patterns
SMART_READER_LITE
LIVE PREVIEW

Query reformulation model and patterns from dango to japanese - - PowerPoint PPT Presentation

Query reformulation model and patterns from dango to japanese cakes M Universit degli M Paolo Boldi studi Y Francesco Bonchi di Milano, Italy Y Carlos Castillo M Sebastiano Vigna Y Yahoo! Research Barcelona, Spain


slide-1
SLIDE 1

Query reformulation model and patterns

from “dango” to “japanese cakes”

Paolo Boldi

M

Francesco Bonchi

Y

Carlos Castillo

Y

Sebastiano Vigna

M M Università degli

studi di Milano, Italy

Y Yahoo! Research

Barcelona, Spain

slide-2
SLIDE 2

Query reformulation :model and patterns:

from “dango” to “japanese cakes”

Paolo Boldi

M

Francesco Bonchi

Y

Carlos Castillo

Y

Sebastiano Vigna

M M Università degli

studi di Milano, Italy

Y Yahoo! Research

Barcelona, Spain

slide-3
SLIDE 3

barcelona cheap barcelona hotels barcelona hotels luxury barcelona hotels

Specialize Generaliz e

brcelona

Corre ct

barcelona f.c.

Generalize Parallel move Specialize Specialize

Rieh, S. Y . and Xie, H: “Analysis of multiple query reformulations on the web”. IPM 32 (3) 2006.

slide-4
SLIDE 4

Reformulation types

Error correction

startford cinema → stratford cinema

Generalization (“zoom out”)

barcelona hotels → barcelona

Specialization (“zoom in”)

barcelona soccer → barcelona camp nou

Zoom-in, zoom-out, pan, names comes from Y!SAMA Rieh and Xie: “Analysis of multiple query reformulations”. IPM 2006.

slide-5
SLIDE 5

Reformulation types

Rephrasing

wikipedia english → english wikipedia robbs celebrities → robbs celebs

Parallel move

barcelona → rome

Rieh and Xie: “Analysis of multiple query reformulations”. IPM 2006.

slide-6
SLIDE 6

P . Boldi, F . Bonchi, C. Castillo, S. Vigna: “Query Reformulation Model and Patterns”. 2008.

Why model reformulation types?

Improved session segmentation Improved recommendations Improved session understanding in general

slide-7
SLIDE 7

P . Boldi, F . Bonchi, C. Castillo, S. Vigna: “Query Reformulation Model and Patterns”. 2008.

Research agenda

Automatically classify query reformulation types Study patterns of query reformulation

C C S S G S ... S P S C S S ... session DNA

Annotate the query-fow graph

slide-8
SLIDE 8

P . Boldi, F . Bonchi, C. Castillo, S. Vigna: “Query Reformulation Model and Patterns”. 2008.

Research agenda

Automatically classify query reformulation types Study patterns of query reformulation

C C S S G S ... S P S C S S ... session DNA

Annotate the query-fow graph

slide-9
SLIDE 9

P . Boldi, F . Bonchi, C. Castillo, S. Vigna: “Query Reformulation Model and Patterns”. 2008.

slide-10
SLIDE 10

P . Boldi, F . Bonchi, C. Castillo, S. Vigna: “Query Reformulation Model and Patterns”. 2008.

Model for classifcation

Labeled examples

1,357 examples, 2/3 training 1/3 testing

Features

Same as chains + edit distance + delta lengths + ...

Learning method

Find easy cases frst, solve hard cases later

slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13

Example classifer output

92% accuracy in the 4-classes problem

slide-14
SLIDE 14

P . Boldi, F . Bonchi, C. Castillo, S. Vigna: “Query Reformulation Model and Patterns”. 2008.

Research agenda

Automatically classify query reformulation types Study patterns of query reformulation

C C S S G S ... S P S C S S ... session DNA

Annotate the query-fow graph

slide-15
SLIDE 15

P . Boldi, F . Bonchi, C. Castillo, S. Vigna: “Query Reformulation Model and Patterns”. 2008.

Datasets

Yahoo! UK search engine

3.4M chains containing 6.6M queries

Yahoo! US search engine

4.0M chains containing 10.5M queries

slide-16
SLIDE 16

Distribution of chain length

slide-17
SLIDE 17

Distribution of reformulation types

slide-18
SLIDE 18

Conditional probability wrt prior P(x|previous=y) / P(x)

Generalizations appear after specializations Corrections follow more corrections

slide-19
SLIDE 19

Salient patterns

Specialization/Generalization pairs Corrections beginning or ending a chain

slide-20
SLIDE 20

T

  • pical patterns
slide-21
SLIDE 21

P . Boldi, F . Bonchi, C. Castillo, S. Vigna: “Query Reformulation Model and Patterns”. 2008.

Research agenda

Automatically classify query reformulation types Study patterns of query reformulation

C C S S G S ... S P S C S S ... session DNA

Annotate the query-fow graph

slide-22
SLIDE 22

Example annotated sub-graph

slide-23
SLIDE 23

P . Boldi, F . Bonchi, C. Castillo, S. Vigna: “Query Reformulation Model and Patterns”. 2008.

Interesting properties

Let G, S, P, C represent the corresponding slice

  • f the query-fow graph

Correlated pairs:

G and ST, S and GT (tend to be anti-symmetric) C and CT, P and PT (tend to be symmetric)

slide-24
SLIDE 24

P . Boldi, F . Bonchi, C. Castillo, S. Vigna: “Query Reformulation Model and Patterns”. 2008.

Entropy measures

Transition-type entropy

Maximum 2 bits (4 transition types)

Next-query entropy

Maximum log2(|Queries|-1)

Note: US data was large, dropped count=1

slide-25
SLIDE 25

Average entropy (freq > 100)

Specializatio: 25.4 = 42 22.6 = 6 Parallel move 26.5 = 91 24.0 = 16

slide-26
SLIDE 26

P . Boldi, F . Bonchi, C. Castillo, S. Vigna: “Query Reformulation Model and Patterns”. 2008.

Conclusions

High accuracy in 4-classes: 92% Specializations and Generalizations alternate Corrections are common at the beginning

and at the end of a chain

Large entropy in specializations/parallel moves Follow-up work: query recommendation

slide-27
SLIDE 27

Q&A