Improved Cascade for Search Mission Detection Matthias Hagen Jakob - - PowerPoint PPT Presentation

improved cascade for search mission detection
SMART_READER_LITE
LIVE PREVIEW

Improved Cascade for Search Mission Detection Matthias Hagen Jakob - - PowerPoint PPT Presentation

Improved Cascade for Search Mission Detection Matthias Hagen Jakob Gomoll Benno Stein Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de SIR 2012 Barcelona, Spain April 1, 2012 Hagen, Gomoll, Stein Improved Cascade for Search


slide-1
SLIDE 1

Improved Cascade for Search Mission Detection

Matthias Hagen Jakob Gomoll Benno Stein

Bauhaus-Universit¨ at Weimar matthias.hagen@uni-weimar.de

SIR 2012 Barcelona, Spain April 1, 2012

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 1

slide-2
SLIDE 2

What is the user searching? bar celona

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 2

slide-3
SLIDE 3

Without context . . . new york nightlife new york clubs new york bars bar celona

source: [http://ecir2012.upf.edu/images/header.jpg]

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 3

slide-4
SLIDE 4

What if you knew the previous queries? new york nightlife new york clubs new york bars bar celona

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 3

slide-5
SLIDE 5

What if you knew the previous queries? new york nightlife new york clubs new york bars bar celona

sources: [http://barcelonaloungenyc.com/] [http://maps.google.com]

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 3

slide-6
SLIDE 6

Query sessions: same information need

Knowing sessions can improve

Understanding of user intent Retrieval performance

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 4

slide-7
SLIDE 7

A typical query log

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2012-03-22 20:34:17 42 istanbul archeology 2012-03-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2012-03-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2012-03-23 18:24:07 42 constantinople 2012-03-23 19:12:40 42 constantinople en.wikipedia.org 4 2012-03-23 19:13:02 42 football barclona 2012-03-23 19:16:01 42 football barcelona 2012-03-23 19:16:11 42 football barcelona www.football.es 3 2012-03-23 19:16:15 42 real vs barca 2012-03-23 20:33:04 42 real vs barca en.wikipedia.org 5 2012-03-23 20:33:12 42 el clasico 2012-03-23 22:42:48 42 constantinople 2012-03-24 10:17:09

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 5

slide-8
SLIDE 8

Highlighted sessions

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2012-03-22 20:34:17 42 istanbul archeology 2012-03-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2012-03-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2012-03-23 18:24:07 42 constantinople 2012-03-23 19:12:40 42 constantinople en.wikipedia.org 4 2012-03-23 19:13:02 — — — — — — — — — — — — — — — — — — 42 football barclona 2012-03-23 19:16:01 42 football barcelona 2012-03-23 19:16:11 42 football barcelona www.football.es 3 2012-03-23 19:16:15 42 real vs barca 2012-03-23 20:33:04 42 real vs barca en.wikipedia.org 5 2012-03-23 20:33:12 42 el clasico 2012-03-23 22:42:48 — — — — — — — — — — — — — — — — — — 42 constantinople 2012-03-24 10:17:09

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 6

slide-9
SLIDE 9

Multitasking and search missions

Observations

[Spink et al., 2006; Jones and Klinkner, 2008]

Search intents interleaved Multitasking Long-term tasks with several sessions Search missions

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 7

slide-10
SLIDE 10

Multitasking and search missions

Observations

[Spink et al., 2006; Jones and Klinkner, 2008]

Search intents interleaved Multitasking Long-term tasks with several sessions Search missions

Session detection

Focused on consecutive queries → Misses multitasking/missions

Example

42 istanbul 2012-03-22 20:34:17 same 42 istanbul archeology 2012-03-23 18:24:07 — — — — — — — — — new

  • 42

football barcelona 2012-03-23 19:16:11 — — — — — — — — — new

  • 42

constantinople 2012-03-24 10:17:09

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 7

slide-11
SLIDE 11

Our topic . . . Session detection + Multitasking/missions

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 8

slide-12
SLIDE 12

Typical query similarity features

Temporal thresholds 5 minutes

[Silverstein et al., 1999]

10–15 minutes

[He and G¨

  • ker, 2000]

30 minutes

[Downey et al., 2007]

user specific

[Murray et al., 2006]

Lexical similarity n-gram overlap

[Zhang and Moffat, 2006]

Levenshtein distance

[Jones and Klinkner, 2008]

Semantic similarity Search results

[Radlinski and Joachims, 2005]

ESA

[Lucchese et al., 2011]

Linked Open Data

[Hollink et al., 2011]

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 9

slide-13
SLIDE 13

Our last year’s cascade . . .

[Hagen et al., 2011]

source: [http://wp.ltchambon.com/wp-content/uploads/2010/09/Cascade-de-Tufs-Baume-les-messieurs-Jura.jpg]

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 10

slide-14
SLIDE 14

. . . well . . . it looks more like this

[Hagen et al., 2011]

source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg]

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 11

slide-15
SLIDE 15

. . . well . . . it looks more like this

[Hagen et al., 2011]

source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg]

Step 1: Subset test ց Step 2: Geometric method ց Step 3: ESA similarity ւ Step 4: Search Results

Basic Idea

Increased feature cost (runtime) from step to step. Expensive features only if previous steps“unreliable.”

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 11

slide-16
SLIDE 16

. . . well . . . it looks more like this (improved)

source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg]

Step 1: Subset test ց Step 2: Geometric method ց Step 3: ESA similarity ւ Step 4: Linked Open Data

Basic Idea

Increased feature cost (runtime) from step to step. Expensive features only if previous steps“unreliable.”

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 11

slide-17
SLIDE 17

Step 1: Subset test

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2012-03-22 20:34:17 42 istanbul archeology 2012-03-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2012-03-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2012-03-23 18:24:07 — — — — — — — — — — — — — — — — — — 42 constantinople 2012-03-23 19:12:40 42 constantinople en.wikipedia.org 4 2012-03-23 19:13:02 — — — — — — — — — — — — — — — — — — 42 football barclona 2012-03-23 19:16:01 — — — — — — — — — — — — — — — — — — 42 football barcelona 2012-03-23 19:16:11 42 football barcelona www.football.es 3 2012-03-23 19:16:15 — — — — — — — — — — — — — — — — — — 42 real vs barca 2012-03-23 20:33:04 42 real vs barca en.wikipedia.org 5 2012-03-23 20:33:12 — — — — — — — — — — — — — — — — — — 42 el clasico 2012-03-23 22:42:48 — — — — — — — — — — — — — — — — — — 42 constantinople 2012-03-24 10:17:09

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 12

slide-18
SLIDE 18

Step 2: Geometric method

[Gayo-Avello, 2009]

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2012-03-22 20:34:17 42 istanbul archeology 2012-03-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2012-03-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2012-03-23 18:24:07 — — — — — — — — — — — — — — — — — — 42 constantinople 2012-03-23 19:12:40 42 constantinople en.wikipedia.org 4 2012-03-23 19:13:02 — — — — — — — — — — — — — — — — — — 42 football barclona 2012-03-23 19:16:01 42 football barcelona 2012-03-23 19:16:11 42 football barcelona www.football.es 3 2012-03-23 19:16:15 — — — — — — — — — — — — — — — — — — 42 real vs barca 2012-03-23 20:33:04 42 real vs barca en.wikipedia.org 5 2012-03-23 20:33:12 — — — — — — — — — — — — — — — — — — 42 el clasico 2012-03-23 22:42:48 — — — — — — — — — — — — — — — — — — 42 constantinople 2012-03-24 10:17:09

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 13

slide-19
SLIDE 19

Step 3: Explicit Semantic Analysis

[Gabrilovich and Markovitch, 2007]

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2012-03-22 20:34:17 42 istanbul archeology 2012-03-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2012-03-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2012-03-23 18:24:07 42 constantinople 2012-03-23 19:12:40 42 constantinople en.wikipedia.org 4 2012-03-23 19:13:02 — — — — — — — — — — — — — — — — — — 42 football barclona 2012-03-23 19:16:01 42 football barcelona 2012-03-23 19:16:11 42 football barcelona www.football.es 3 2012-03-23 19:16:15 42 real vs barca 2012-03-23 20:33:04 42 real vs barca en.wikipedia.org 5 2012-03-23 20:33:12 — — — — — — — — — — — — — — — — — — 42 el clasico 2012-03-23 22:42:48 — — — — — — — — — — — — — — — — — — 42 constantinople 2012-03-24 10:17:09

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 14

slide-20
SLIDE 20

Step 4: Linked Open Data connections

[Hollink et al., 2011]

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2012-03-22 20:34:17 42 istanbul archeology 2012-03-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2012-03-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2012-03-23 18:24:07 42 constantinople 2012-03-23 19:12:40 42 constantinople en.wikipedia.org 4 2012-03-23 19:13:02 — — — — — — — — — — — — — — — — — — 42 football barclona 2012-03-23 19:16:01 42 football barcelona 2012-03-23 19:16:11 42 football barcelona www.football.es 3 2012-03-23 19:16:15 42 real vs barca 2012-03-23 20:33:04 42 real vs barca en.wikipedia.org 5 2012-03-23 20:33:12 42 el clasico 2012-03-23 22:42:48 — — — — — — — — — — — — — — — — — — 42 constantinople 2012-03-24 10:17:09

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 15

slide-21
SLIDE 21

What about multitasking/missions?

Idea

Run the cascade twice:

1 Session detection on query level 2 Multitasking/mission detection on session level Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 16

slide-22
SLIDE 22

First run: detected sessions

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2012-03-22 20:34:17 42 istanbul archeology 2012-03-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2012-03-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2012-03-23 18:24:07 42 constantinople 2012-03-23 19:12:40 42 constantinople en.wikipedia.org 4 2012-03-23 19:13:02 — — — — — — — — — — — — — — — — — — 42 football barclona 2012-03-23 19:16:01 42 football barcelona 2012-03-23 19:16:11 42 football barcelona www.football.es 3 2012-03-23 19:16:15 42 real vs barca 2012-03-23 20:33:04 42 real vs barca en.wikipedia.org 5 2012-03-23 20:33:12 42 el clasico 2012-03-23 22:42:48 — — — — — — — — — — — — — — — — — — 42 constantinople 2012-03-24 10:17:09

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 17

slide-23
SLIDE 23

Second run: multitasking/mission detection

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2012-03-22 20:34:17 42 istanbul archeology 2012-03-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2012-03-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2012-03-23 18:24:07 42 constantinople 2012-03-23 19:12:40 42 constantinople en.wikipedia.org 4 2012-03-23 19:13:02 — — — — — — — — — — — — — — — — — — 42 football barclona 2012-03-23 19:16:01 42 football barcelona 2012-03-23 19:16:11 42 football barcelona www.football.es 3 2012-03-23 19:16:15 42 real vs barca 2012-03-23 20:33:04 42 real vs barca en.wikipedia.org 5 2012-03-23 20:33:12 42 el clasico 2012-03-23 22:42:48 — — — — — — — — — — — — — — — — — — 42 constantinople 2012-03-24 10:17:09

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 17

slide-24
SLIDE 24

Second run: multitasking/mission detection

User Query Click domain + Click rank Time 42 istanbul en.wikipedia.org 1 2012-03-22 20:34:17 42 istanbul archeology 2012-03-23 12:02:54 42 istanbul archeology www.turizm.tr 6 2012-03-23 12:03:15 42 istanbul archeology www.arkeoloji.tr 13 2012-03-23 18:24:07 42 constantinople 2012-03-23 19:12:40 42 constantinople en.wikipedia.org 4 2012-03-23 19:13:02 — — — — — — — — — — — — — — — — — — 42 football barclona 2012-03-23 19:16:01 42 football barcelona 2012-03-23 19:16:11 42 football barcelona www.football.es 3 2012-03-23 19:16:15 42 real vs barca 2012-03-23 20:33:04 42 real vs barca en.wikipedia.org 5 2012-03-23 20:33:12 42 el clasico 2012-03-23 22:42:48 — — — — — — — — — — — — — — — — — — 42 constantinople 2012-03-24 10:17:09

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 17

slide-25
SLIDE 25

What about accuracy and runtime?

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 18

slide-26
SLIDE 26

Available evaluation corpora

Gayo-Avello’s session detection corpus (AOL log, 1 annotator)

11 500 queries But: empty queries, order changed 215 users But: many with ≤ 3 queries 2.7 queries per session But: several annotation errors

Lucchese et al.’s mission detection corpus (AOL log, 1 annotator)

1500 queries But: 97% of queries dropped 13 users

Our new mission detection corpus (basis: Gayo-Avello, 2 annotators)

8800 queries Empty/URL queries removed 127 users Users with ≤ 3 queries removed 11 missions per user with 6.33 queries

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 19

slide-27
SLIDE 27

Available evaluation corpora

Gayo-Avello’s session detection corpus (AOL log, 1 annotator)

11 500 queries But: empty queries, order changed 215 users But: many with ≤ 3 queries 2.7 queries per session But: several annotation errors

Lucchese et al.’s mission detection corpus (AOL log, 1 annotator)

1500 queries But: 97% of queries dropped 13 users

Our new mission detection corpus (basis: Gayo-Avello, 2 annotators)

8800 queries Empty/URL queries removed 127 users Users with ≤ 3 queries removed 11 missions per user with 6.33 queries

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 19

slide-28
SLIDE 28

Available evaluation corpora

Gayo-Avello’s session detection corpus (AOL log, 1 annotator)

11 500 queries But: empty queries, order changed 215 users But: many with ≤ 3 queries 2.7 queries per session But: several annotation errors

Lucchese et al.’s mission detection corpus (AOL log, 1 annotator)

1500 queries But: 97% of queries dropped 13 users

Our new mission detection corpus (basis: Gayo-Avello, 2 annotators)

8800 queries Empty/URL queries removed 127 users Users with ≤ 3 queries removed 11 missions per user with 6.33 queries

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 19

slide-29
SLIDE 29

Accuracy and runtime

Session accuracy on our corpus (6630 queries, 25 % training)

F-Measure Runtime Original cascade (3 steps) 0.875 100 % Improved cascade (3 steps) 0.890 90 % Improved cascade (4 steps) 0.890 ≫100 %

Mission accuracy on our corpus (6630 queries, 25 % training)

556 continuations correctly detected (170 missed) 97 sessions wrongly assigned a continuation F-Measure 0.798

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 20

slide-30
SLIDE 30

Accuracy and runtime

Session accuracy on our corpus (6630 queries, 25 % training)

F-Measure Runtime Original cascade (3 steps) 0.875 100 % Improved cascade (3 steps) 0.890 90 % Improved cascade (4 steps) 0.890 ≫100 %

Mission accuracy on our corpus (6630 queries, 25 % training)

556 continuations correctly detected (170 missed) 97 sessions wrongly assigned a continuation F-Measure 0.798

Observations

Cascade applicable to mission detection Linked Open Data not that useful yet

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 20

slide-31
SLIDE 31

Almost the end: The take-away messages!

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 21

slide-32
SLIDE 32

What we have done

Results

Improved cascading method Cheap features first Applicable to mission detection LOD not really useful yet Large mission corpus

Future Work

Prune LOD graph Index complete Wikipedia WordNet

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 22

slide-33
SLIDE 33

What we have (not) done

Results

Improved cascading method Cheap features first Applicable to mission detection LOD not really useful yet Large mission corpus

Future Work

Prune LOD graph Index complete Wikipedia WordNet

Hagen, Gomoll, Stein Improved Cascade for Search Mission Detection 22

slide-34
SLIDE 34

What we have (not) done

Results

Improved cascading method Cheap features first Applicable to mission detection LOD not really useful yet Large mission corpus

Future Work

Prune LOD graph Index complete Wikipedia WordNet

Thank you

  • Hagen, Gomoll, Stein

Improved Cascade for Search Mission Detection 22