query session detection as a cascade
play

Query Session Detection as a Cascade Matthias Hagen Benno Stein - PowerPoint PPT Presentation

Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R ub Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de SIR 2011 Dublin, Ireland April 18, 2011 Hagen, Stein, R ub Query Session Detection as a Cascade 1


  1. Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R¨ ub Bauhaus-Universit¨ at Weimar matthias.hagen@uni-weimar.de SIR 2011 Dublin, Ireland April 18, 2011 Hagen, Stein, R¨ ub Query Session Detection as a Cascade 1

  2. Introduction Motivation It’s quiz time! Hagen, Stein, R¨ ub Query Session Detection as a Cascade 2

  3. Introduction Motivation It’s quiz time! What is the user searching? paris hilton Hagen, Stein, R¨ ub Query Session Detection as a Cascade 2

  4. Introduction Motivation Without context . . . paris hilton source: [http://upload.wikimedia.org/wikipedia/commons/2/26/Paris Hilton 3 Crop.jpg] Hagen, Stein, R¨ ub Query Session Detection as a Cascade 3

  5. Introduction Motivation What if you knew the previous queries? paris hotels paris marriott paris hyatt paris hilton Hagen, Stein, R¨ ub Query Session Detection as a Cascade 4

  6. Introduction Motivation What if you knew the previous queries? paris hotels paris marriott paris hyatt paris hilton sources: [http://www.alison-anderson.com/wp-content/uploads/hilton hotel paris 2.jpg] [http://maps.google.com/] [http://upload.wikimedia.org/wikipedia/en/e/eb/HI mk logo hiltonbrandlogo.jpg] Hagen, Stein, R¨ ub Query Session Detection as a Cascade 4

  7. Introduction Motivation Query sessions: same information need The benefits Improved understanding of user intent Improved retrieval performance via session knowledge Hagen, Stein, R¨ ub Query Session Detection as a Cascade 5

  8. Introduction Motivation Query sessions: same information need The benefits Improved understanding of user intent Improved retrieval performance via session knowledge The“minor”issue Users do not announce when querying for a new information need. Hagen, Stein, R¨ ub Query Session Detection as a Cascade 5

  9. Introduction Motivation A typical query log User Query Click domain + Click rank Time 773 en.wikipedia.org 1 2011-04-16 20:34:17 istanbul 773 2011-04-17 12:02:54 istanbul archeology 773 www.kulturturizm.tr 6 2011-04-17 12:03:15 istanbul archeology 773 www.arkeoloji.gov.tr 13 2011-04-17 18:24:07 istanbul archeology 773 constantinople 2011-04-17 19:00:40 773 constantinople www.roman-empire.net 4 2011-04-17 19:01:02 773 2011-04-17 19:03:01 hurling 773 en.wikipedia.org 1 2011-04-17 19:03:05 hurling 773 2011-04-17 23:33:04 liam mccarthy cup 773 www.hurling.net 5 2011-04-17 23:33:12 liam mccarthy cup 773 starbets.ie 16 2011-04-18 12:42:48 liam mccarthy cup Hagen, Stein, R¨ ub Query Session Detection as a Cascade 6

  10. Introduction Motivation How to determine the break points? User Query Click domain + Click rank Time 773 en.wikipedia.org 1 2011-04-16 20:34:17 istanbul 773 2011-04-17 12:02:54 istanbul archeology 773 www.kulturturizm.tr 6 2011-04-17 12:03:15 istanbul archeology 773 www.arkeoloji.gov.tr 13 2011-04-17 18:24:07 istanbul archeology 773 constantinople 2011-04-17 19:00:40 773 constantinople www.roman-empire.net 4 2011-04-17 19:01:02 — — — — — — — — — — — — — — — — — — 773 2011-04-17 19:03:01 hurling 773 en.wikipedia.org 1 2011-04-17 19:03:05 hurling 773 2011-04-17 23:33:04 liam mccarthy cup 773 www.hurling.net 5 2011-04-17 23:33:12 liam mccarthy cup 773 starbets.ie 16 2011-04-18 12:42:48 liam mccarthy cup Hagen, Stein, R¨ ub Query Session Detection as a Cascade 7

  11. Introduction The Problem The key is . . . Automatic query session detection Hagen, Stein, R¨ ub Query Session Detection as a Cascade 8

  12. Introduction The Problem Automatic query session detection Usual“technique” Check for consecutive queries whether same/new information need. Example 773 2011-04-16 20:34:17 istanbul � same 773 2011-04-17 18:24:07 istanbul archeology � same 773 2011-04-17 19:01:02 constantinople � new — — — — — — — — — 773 2011-04-17 19:03:05 hurling Hagen, Stein, R¨ ub Query Session Detection as a Cascade 9

  13. Introduction Related Work Typical features Temporal thresholds 5 minutes [Silverstein et al., 1999] 10–15 minutes [He and G¨ oker, 2000] 30 minutes [Downey et al., 2007] user specific [Murray et al., 2006] Lexical similarity n -gram overlap [Zhang and Moffat, 2006] Levenshtein distance [Jones and Klinkner, 2008] Semantic similarity Search results [Radlinski and Joachims, 2005] ESA [Lucchese et al., 2011] Hagen, Stein, R¨ ub Query Session Detection as a Cascade 10

  14. Introduction Related Work Previous methods Observations Temporal thresholds: fast but bad accuracy Feature combinations: more accurate One of the best: Geometric method (time + lexical) [Gayo-Avello, 2009] Hagen, Stein, R¨ ub Query Session Detection as a Cascade 11

  15. Introduction Related Work Previous methods Observations Temporal thresholds: fast but bad accuracy Feature combinations: more accurate One of the best: Geometric method (time + lexical) [Gayo-Avello, 2009] Shortcomings All features evaluated simultaneously → runtime Geometric method ignores semantics → accuracy Examples Subset test suffices Geometric method fails hurling hurling � same � same hurling gaa mccarthy cup Hagen, Stein, R¨ ub Query Session Detection as a Cascade 11

  16. Cascading Method The Framework We address the shortcomings in a cascade . . . source: [http://wp.ltchambon.com/wp-content/uploads/2010/09/Cascade-de-Tufs-Baume-les-messieurs-Jura.jpg] Hagen, Stein, R¨ ub Query Session Detection as a Cascade 12

  17. Cascading Method The Framework . . . well . . . a small 4-step cascade source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg] Hagen, Stein, R¨ ub Query Session Detection as a Cascade 13

  18. Cascading Method The Framework . . . well . . . a small 4-step cascade Step 1: Subset tests ց Step 2: Geometric method ց Step 3: ESA similarity ւ Step 4: Search results source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg] Basic Idea Increased feature cost (runtime) from step to step. Expensive features only if previous steps“unreliable.” Hagen, Stein, R¨ ub Query Session Detection as a Cascade 13

  19. Cascading Method Step 1: Subset tests Simple string comparison Criterion Consecutive queries q and q ′ in same session if q sub- or superset of q ′ . Else: Goto Step 2. Remarks: Repetition, specialization, or generalization. Time gap = continuing a pending session. Example Repetition Specialization Generalization hurling � same hurling gaa � same hurling � same hurling hurling gaa hurling Hagen, Stein, R¨ ub Query Session Detection as a Cascade 14

  20. Cascading Method Step 2: Geometric method Combination of temporal and lexical features [Gayo-Avello, 2009] For consecutive queries q and q ′ t f temp = maximum of 0 and 1 − t is time between q and q ′ 24 h = cosine similarity of 3- to 5-grams of q ′ and s f lex s is session of q Hagen, Stein, R¨ ub Query Session Detection as a Cascade 15

  21. Cascading Method Step 2: Geometric method Combination of temporal and lexical features [Gayo-Avello, 2009] For consecutive queries q and q ′ t f temp = maximum of 0 and 1 − t is time between q and q ′ 24 h = cosine similarity of 3- to 5-grams of q ′ and s f lex s is session of q 1.0 Same session Criterion (original) Nearly identical 0.8 queries at long temporal distance Lexical similarity 0.6 Consecutive queries q and q ′ in same New session session if 0.4 � f 2 temp + f 2 lex ≥ 1. 0.2 Different queries with no temporal distance 0 0 0.2 0.4 0.6 0.8 1.0 Temporal similarity Hagen, Stein, R¨ ub Query Session Detection as a Cascade 15

  22. Cascading Method Step 2: Geometric method Performs well on standard test corpus . . . 1.0 1.0 0.8 0.8 Lexical similarity Lexical similarity 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Temporal similarity Temporal similarity Same session New session Hagen, Stein, R¨ ub Query Session Detection as a Cascade 16

  23. Cascading Method Step 2: Geometric method . . . but has some problems“on the edge” 1.0 11 0 0 0 0 47 10 11 2 11 0.8 Major problems 1 2 0 1 0 Lexical similarity 0 0 0 0 7 Similar queries, time gap (upper left) 0.6 → Merely a matter of opinion 1 0 2 4 2 8 0 0 0 0 0.4 1 0 4 6 14 Diff. queries, same semantics (lower right) 0 0 0 0 23 0.2 → Incorporate semantics 7 5 5 14 583 0 0 0 0 50 0 0 0.2 0.4 0.6 0.8 1.0 Temporal similarity Hagen, Stein, R¨ ub Query Session Detection as a Cascade 17

  24. Cascading Method Step 2: Geometric method . . . but has some problems“on the edge” 1.0 11 0 0 0 0 47 10 11 2 11 0.8 Major problems 1 2 0 1 0 Lexical similarity 0 0 0 0 7 Similar queries, time gap (upper left) 0.6 → Merely a matter of opinion 1 0 2 4 2 8 0 0 0 0 0.4 1 0 4 6 14 Diff. queries, same semantics (lower right) 0 0 0 0 23 0.2 → Incorporate semantics 7 5 5 14 583 0 0 0 0 50 0 0 0.2 0.4 0.6 0.8 1.0 Temporal similarity Criterion (adapted) Original geometric method if f temp < 0 . 8 or f lex > 0 . 4. Else: Goto Step 3. Hagen, Stein, R¨ ub Query Session Detection as a Cascade 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend