Query Session Detection as a Cascade Matthias Hagen Benno Stein - - PowerPoint PPT Presentation

query session detection as a cascade
SMART_READER_LITE
LIVE PREVIEW

Query Session Detection as a Cascade Matthias Hagen Benno Stein - - PowerPoint PPT Presentation

Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de CIKM 2011 Glasgow, Scotland October 25, 2011 Hagen, Stein, R ub Query Session Detection as a


slide-1
SLIDE 1

Query Session Detection as a Cascade

Matthias Hagen Benno Stein Tino R¨ ub

Bauhaus-Universit¨ at Weimar matthias.hagen@uni-weimar.de

CIKM 2011 Glasgow, Scotland October 25, 2011

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 1

slide-2
SLIDE 2

It’s quiz time!

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 2

slide-3
SLIDE 3

It’s quiz time! What is the user searching? paris hilton

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 2

slide-4
SLIDE 4

Without context . . . paris hilton

source: [http://upload.wikimedia.org/wikipedia/commons/2/26/Paris Hilton 3 Crop.jpg]

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 3

slide-5
SLIDE 5

What if you knew the previous queries? paris hotels paris marriott paris hyatt paris hilton

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 4

slide-6
SLIDE 6

What if you knew the previous queries? paris hotels paris marriott paris hyatt paris hilton

sources: [http://www.alison-anderson.com/wp-content/uploads/hilton hotel paris 2.jpg] [http://maps.google.com/] [http://upload.wikimedia.org/wikipedia/en/e/eb/HI mk logo hiltonbrandlogo.jpg]

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 4

slide-7
SLIDE 7

Query sessions: same information need

The benefits

Improved understanding of user intent Improved retrieval performance via session knowledge

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 5

slide-8
SLIDE 8

Query sessions: same information need

The benefits

Improved understanding of user intent Improved retrieval performance via session knowledge

The“minor”issue

Users do not announce when querying for a new information need.

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 5

slide-9
SLIDE 9

A typical query log

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2011-10-22 20:34:17 42 istanbul archeology 2011-10-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2011-10-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2011-10-23 18:24:07 42 constantinople 2011-10-23 19:12:40 42 constantinople en.wikipedia.org 4 2011-10-23 19:13:02 42 soccr glasgo 2011-10-23 19:16:01 42 soccer glasgow 2011-10-23 19:16:11 42 soccer glasgow www.soccer.uk 3 2011-10-23 19:16:15 42 celtics vs rangers 2011-10-23 20:33:04 42 celtics vs rangers en.wikipedia.org 5 2011-10-23 20:33:12 42

  • ld firm

2011-10-23 22:42:48

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 6

slide-10
SLIDE 10

How to determine the break points?

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2011-10-22 20:34:17 42 istanbul archeology 2011-10-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2011-10-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2011-10-23 18:24:07 42 constantinople 2011-10-23 19:12:40 42 constantinople en.wikipedia.org 4 2011-10-23 19:13:02 — — — — — — — — — — — — — — — — — — 42 soccr glasgo 2011-10-23 19:16:01 42 soccer glasgow 2011-10-23 19:16:11 42 soccer glasgow www.soccer.uk 3 2011-10-23 19:16:15 42 celtics vs rangers 2011-10-23 20:33:04 42 celtics vs rangers en.wikipedia.org 5 2011-10-23 20:33:12 42

  • ld firm

2011-10-23 22:42:48

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 7

slide-11
SLIDE 11

The key is . . . Automatic query session detection

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 8

slide-12
SLIDE 12

Automatic query session detection

Usual“technique”

Check for consecutive queries whether same/new information need.

Example

42 istanbul 2011-10-22 20:34:17 same 42 istanbul archeology 2011-10-23 18:24:07 same 42 constantinople 2011-10-23 19:12:40 — — — — — — — — — new 42 soccer glasgow 2011-10-23 19:16:11

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 9

slide-13
SLIDE 13

Typical features

Temporal thresholds 5 minutes

[Silverstein et al., 1999]

10–15 minutes

[He and G¨

  • ker, 2000]

30 minutes

[Downey et al., 2007]

user specific

[Murray et al., 2006]

Lexical similarity n-gram overlap

[Zhang and Moffat, 2006]

Levenshtein distance

[Jones and Klinkner, 2008]

Semantic similarity Search results

[Radlinski and Joachims, 2005]

ESA

[Lucchese et al., 2011]

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 10

slide-14
SLIDE 14

Previous methods

Feature combinations

More accurate than single features One of the best: Geometric method (time + lexical)

[Gayo-Avello, 2009]

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 11

slide-15
SLIDE 15

Previous methods

Feature combinations

More accurate than single features One of the best: Geometric method (time + lexical)

[Gayo-Avello, 2009]

Shortcomings

All features evaluated simultaneously → runtime Geometric method ignores semantics → accuracy

Examples

Subset test suffices soccer same soccer glasgow Geometric method fails celtics vs rangers same

  • ld firm

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 11

slide-16
SLIDE 16

We address the shortcomings in a cascade . . .

source: [http://wp.ltchambon.com/wp-content/uploads/2010/09/Cascade-de-Tufs-Baume-les-messieurs-Jura.jpg]

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 12

slide-17
SLIDE 17

. . . well . . . a small 4-step cascade

source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg]

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 13

slide-18
SLIDE 18

. . . well . . . a small 4-step cascade

source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg]

Step 1: Subset test ց Step 2: Geometric method ց Step 3: ESA similarity ւ Step 4: Search results

Basic Idea

Increased feature cost (runtime) from step to step. Expensive features only if previous steps“unreliable.”

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 13

slide-19
SLIDE 19

Step 1: Subset test

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2011-10-22 20:34:17 42 istanbul archeology 2011-10-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2011-10-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2011-10-23 18:24:07 — — — — — — — — — — — — — — — — — — 42 constantinople 2011-10-23 19:12:40 42 constantinople en.wikipedia.org 4 2011-10-23 19:13:02 — — — — — — — — — — — — — — — — — — 42 soccr glasgo 2011-10-23 19:16:01 — — — — — — — — — — — — — — — — — — 42 soccer glasgow 2011-10-23 19:16:11 42 soccer glasgow www.soccer.uk 3 2011-10-23 19:16:15 — — — — — — — — — — — — — — — — — — 42 celtics vs rangers 2011-10-23 20:33:04 42 celtics vs rangers en.wikipedia.org 5 2011-10-23 20:33:12 — — — — — — — — — — — — — — — — — — 42

  • ld firm

2011-10-23 22:42:48

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 14

slide-20
SLIDE 20

Step 2: Geometric method

[Gayo-Avello, 2009]

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2011-10-22 20:34:17 42 istanbul archeology 2011-10-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2011-10-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2011-10-23 18:24:07 — — — — — — — — — — — — — — — — — — 42 constantinople 2011-10-23 19:12:40 42 constantinople en.wikipedia.org 4 2011-10-23 19:13:02 — — — — — — — — — — — — — — — — — — 42 soccr glasgo 2011-10-23 19:16:01 42 soccer glasgow 2011-10-23 19:16:11 42 soccer glasgow www.soccer.uk 3 2011-10-23 19:16:15 — — — — — — — — — — — — — — — — — — 42 celtics vs rangers 2011-10-23 20:33:04 42 celtics vs rangers en.wikipedia.org 5 2011-10-23 20:33:12 — — — — — — — — — — — — — — — — — — 42

  • ld firm

2011-10-23 22:42:48

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 15

slide-21
SLIDE 21

Step 3: Explicit Semantic Analysis

[Gabrilovich and Markovitch, 2007]

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2011-10-22 20:34:17 42 istanbul archeology 2011-10-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2011-10-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2011-10-23 18:24:07 42 constantinople 2011-10-23 19:12:40 42 constantinople en.wikipedia.org 4 2011-10-23 19:13:02 — — — — — — — — — — — — — — — — — — 42 soccr glasgo 2011-10-23 19:16:01 42 soccer glasgow 2011-10-23 19:16:11 42 soccer glasgow www.soccer.uk 3 2011-10-23 19:16:15 42 celtics vs rangers 2011-10-23 20:33:04 42 celtics vs rangers en.wikipedia.org 5 2011-10-23 20:33:12 — — — — — — — — — — — — — — — — — — 42

  • ld firm

2011-10-23 22:42:48

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 16

slide-22
SLIDE 22

Step 4: Search results

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2011-10-22 20:34:17 42 istanbul archeology 2011-10-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2011-10-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2011-10-23 18:24:07 42 constantinople 2011-10-23 19:12:40 42 constantinople en.wikipedia.org 4 2011-10-23 19:13:02 — — — — — — — — — — — — — — — — — — 42 soccr glasgo 2011-10-23 19:16:01 42 soccer glasgow 2011-10-23 19:16:11 42 soccer glasgow www.soccer.uk 3 2011-10-23 19:16:15 42 celtics vs rangers 2011-10-23 20:33:04 42 celtics vs rangers en.wikipedia.org 5 2011-10-23 20:33:12 42

  • ld firm

2011-10-23 22:42:48

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 17

slide-23
SLIDE 23

That’s the complete cascade

source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg]

Step 1: Subset test ց Step 2: Geometric method ց Step 3: ESA similarity ւ Step 4: Search results

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 18

slide-24
SLIDE 24

That’s the complete cascade

source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg]

Step 1: Subset test ց Step 2: Geometric method ց Step 3: ESA similarity ւ Step 4: Search results

What about accuracy and runtime?

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 18

slide-25
SLIDE 25

Experimental Evaluation

Accuracy on Gayo-Avello’s corpus (11 000 queries, 2.7 per session)

Precision Recall F-Measure (β = 1.5) Geometric 0.8673 0.9431 0.9184 Cascading 0.8618 0.9676 0.9328

Performance per step

decides F-Measure time factor Step 1 40.49% 0.8303 0.08 ms 1.0 Step 2 35.15% 0.9292 0.20 ms 2.5 Step 3 2.05% 0.9316 0.27 ms 3.4 Step 4 0.85% 0.9328 9.85 ms 123.1

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 19

slide-26
SLIDE 26

Experimental Evaluation

Accuracy on Gayo-Avello’s corpus (11 000 queries, 2.7 per session)

Precision Recall F-Measure (β = 1.5) Geometric 0.8673 0.9431 0.9184 Cascading 0.8618 0.9676 0.9328

Performance per step

decides F-Measure time factor Step 1 40.49% 0.8303 0.08 ms 1.0 Step 2 35.15% 0.9292 0.20 ms 2.5 Step 3 2.05% 0.9316 0.27 ms 3.4 Step 4 0.85% 0.9328 9.85 ms 123.1

Remark: Without Step 4 about 2 700 queries per second!

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 19

slide-27
SLIDE 27

Almost the end: The take-away messages!

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 20

slide-28
SLIDE 28

What we have done

Results

Cascading method Cheap features first Beats geometric 3 step version: simple, fast, high quality sessions

Future Work

Postprocessing for multi-tasking Postprocessing for goals/missions

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 21

slide-29
SLIDE 29

What we have (not) done

Results

Cascading method Cheap features first Beats geometric 3 step version: simple, fast, high quality sessions

Future Work

Postprocessing for multi-tasking Postprocessing for goals/missions

Hagen, Stein, R¨ ub Query Session Detection as a Cascade 21

slide-30
SLIDE 30

What we have (not) done

Results

Cascading method Cheap features first Beats geometric 3 step version: simple, fast, high quality sessions

Future Work

Postprocessing for multi-tasking Postprocessing for goals/missions

Thank you

  • Hagen, Stein, R¨

ub Query Session Detection as a Cascade 21