Supporting Scholarly Search with Keyqueries Matthias Hagen Anna - - PowerPoint PPT Presentation

supporting scholarly search with keyqueries
SMART_READER_LITE
LIVE PREVIEW

Supporting Scholarly Search with Keyqueries Matthias Hagen Anna - - PowerPoint PPT Presentation

Supporting Scholarly Search with Keyqueries Matthias Hagen Anna Beyer Tim Gollub Kristof Komlossy Benno Stein Bauhaus-Universit at Weimar matthias.hagen@uni-weimar.de @matthias_hagen ECIR 2016 Padova, Italy March 23, 2016 Hagen,


slide-1
SLIDE 1

Supporting Scholarly Search with Keyqueries

Matthias Hagen Anna Beyer Tim Gollub Kristof Komlossy Benno Stein

Bauhaus-Universit¨ at Weimar matthias.hagen@uni-weimar.de @matthias_hagen

ECIR 2016 Padova, Italy March 23, 2016

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 1

slide-2
SLIDE 2

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 2

slide-3
SLIDE 3

When you start exploring a new topic

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 3

slide-4
SLIDE 4

When you start exploring a new topic

The first papers are easily found (colleagues, web search, . . . ) But to find“everything:” Follow references and citations Check Google Scholar“Related articles” Formulate new queries from the read papers

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 4

slide-5
SLIDE 5

When you start exploring a new topic

The first papers are easily found (colleagues, web search, . . . ) But to find“everything:” Follow references and citations Check Google Scholar“Related articles” Formulate new queries from the read papers . . . takes time

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 4

slide-6
SLIDE 6

When you start exploring a new topic

The first papers are easily found (colleagues, web search, . . . ) But to find“everything:” Follow references and citations Check Google Scholar“Related articles” Formulate new queries from the read papers . . . takes time . . . a lot of time

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 4

slide-7
SLIDE 7

Automatic suggestions for the rescue!

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 5

slide-8
SLIDE 8

Formalized as a problem

RELATED WORK SEARCH Given: A small input set D of papers. Task: Find an output set R of related papers.

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 6

slide-9
SLIDE 9

Related work for related work search

(80+ papers)

Citation-Based

[Golshan et al., SIGMOD 2012] [Caragea et al., JCDL 2013] [Ekstrand at al., RecSys 2010] [K¨ u¸ c¨ uktun¸ c et al., JCDL 2013] [Sugiyama and Kan, JCDL 2013]

Content-Based

[Nascimento et al., JCDL 2011] [Huang et al., CIKM 2012] [Kataria, Mitra, and Bhatia, AAAI 2010] [Lu et al., CIKM 2011] [Nallapati et al., KDD 2008] [Tang et al., PAKDD 2009 & SIGIR 2014]

Mixed

[Google Scholar “Related articles” ] [El-Arini and Guestrin, KDD 2011] [He et al., WWW 2010 & WSDM 2011] [Livne et al., SIGIR 2014] [Wang and Blei, KDD 2011]

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 7

slide-10
SLIDE 10

Related work for related work search

(80+ papers)

Citation-Based

[Golshan et al., SIGMOD 2012] [Caragea et al., JCDL 2013] [Ekstrand at al., RecSys 2010] [K¨ u¸ c¨ uktun¸ c et al., JCDL 2013] [Sugiyama and Kan, JCDL 2013]

Content-Based

[Nascimento et al., JCDL 2011] [Huang et al., CIKM 2012] [Kataria, Mitra, and Bhatia, AAAI 2010] [Lu et al., CIKM 2011] [Nallapati et al., KDD 2008] [Tang et al., PAKDD 2009 & SIGIR 2014]

Mixed

[Google Scholar “Related articles” ] [El-Arini and Guestrin, KDD 2011] [He et al., WWW 2010 & WSDM 2011] [Livne et al., SIGIR 2014] [Wang and Blei, KDD 2011]

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 7

slide-11
SLIDE 11

Our contribution is query formulation (content-based)

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 8

slide-12
SLIDE 12

The key are . . .

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 9

slide-13
SLIDE 13

The key are . . . keyqueries

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 9

slide-14
SLIDE 14

What is a keyquery?

Query q is a keyquery for a set D of documents against a search engine iff

1 Every d ∈ D is in the top-k results.

(specificity)

2 Query q has at least l results.

(generality)

3 No q′ ⊂ q satisfies the above.

(minimality) Remark: For small |D| ≤ 5, typically l ≥ 10 and k = 10.

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 10

slide-15
SLIDE 15

Example: Keyquery for a paper (l ≥ 1000, k = 3)

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 11

slide-16
SLIDE 16

Example: chatnoir is keyquery against Google Scholar

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 12

slide-17
SLIDE 17

Example: chatnoir is keyquery against Google Scholar

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 13

slide-18
SLIDE 18

Example: . . . but not against Google

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 14

slide-19
SLIDE 19

Example: . . . but not against Google

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 15

slide-20
SLIDE 20

Keyqueries as a conceptual framework

Represent a document (set) by its keyqueries Related documents also in the top results From keywords to keyqueries Retrieval model exploited!

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 16

slide-21
SLIDE 21

Our general algorithmic idea

Assumption: on user side without direct index access, but API Solution:

1 Keyphrase extraction from input documents

[KP-Miner, 2009]

2 Keyquery cover using the keyphrases 3 Keyquery results as suggestions Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 17

slide-22
SLIDE 22

The keyquery cover problem

KEYQUERY COVER Given: (1) A vocabulary W extracted from a set D of documents. (2) Levels k and l describing keyquery generality. Task: Find a simple set Q ⊆ 2W of queries that are keyquery for every d ∈ D with respect to k and l and that together cover W .

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 18

slide-23
SLIDE 23

Keyquery cover computation

1 Sort keyphrases by importance 2 Greedily add keyphrases until keyquery 3 Start again with first not-yet-covered phrase

{ } {p2} {p1} {p3} {p4} {p5} {p3,p4,p5} {p1,p2,p3} {p1,p2,p4} {p1,p2,p5} {p1,p3,p4} {p1,p3,p5} {p1,p4,p5} {p2,p3,p4} {p2,p3,p5} {p2,p4,p5} {p1, p2, p3, p4} {p1, p2, p3, p5} {p1, p2, p4, p5} {p1, p3, p4, p5} {p2, p3, p4, p5} {p1, p2, p3, p4, p5} {p1, p3} {p1, p4} {p2, p3} {p4, p5} {p1, p2} {p1, p5} {p2, p4} {p2, p5} {p3, p5} {p3, p4}

  • verly specific queries
  • verly generic queries

query combination constraint

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 19

slide-24
SLIDE 24

Evaluation

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 20

slide-25
SLIDE 25

Are the users impressed?!

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 21

slide-26
SLIDE 26

User study

Collection: 200,000 CS papers (top conferences as seeds) Search engine: Lucene 5.0, BM25F (title, abstract, body) Participants: 13 researchers, 7 students Topics: 42 provided by participants

1 Participant provides up to five input papers for a familiar topic 2 Participant provides at least one expected document 3 Algorithms run on the input against our collection 4 Participant judges relevance and familiarity Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 22

slide-27
SLIDE 27

User study results

Algorithm nDCG@10 rece@50 recur@10 Nascimento 0.58 0.34 0.16 Sofia Search 0.60 0.33 0.20 Google Scholar 0.60 0.43 0.21 Keyquery Cover 0.62 0.37 0.16 KQC+Sofia+Google 0.65 0.48 0.24 Nascimento query baseline outperformed On a par with Google Scholar and Sofia Search Rather different suggestions (overlap < 50%) Combination most promising

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 23

slide-28
SLIDE 28

Runtime?!

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 24

slide-29
SLIDE 29

API requests needed in user study

Nascimento: 19 Google Scholar: 21 Sofia Search: at least twice as fast as keyqueries Keyquery Cover: 59

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 25

slide-30
SLIDE 30

API requests needed in user study

Nascimento: 19 Google Scholar: 21 Sofia Search: at least twice as fast as keyqueries Keyquery Cover: 59 Keyqueries could be pre-computed by a scholarly search engine. Stored in a reverted index.

[Pickens, Cooper, and Golovchinsky, CIKM 2010]

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 25

slide-31
SLIDE 31

Almost the end: The take-home messages!

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 26

slide-32
SLIDE 32

What we have done

Results

Keyqueries for scholarly search Keyquery cover from keyphrases Query baseline outperformed On a par with Google Scholar and Sofia Search Combination is best

Future Work

Efficiency Other topics and corpora Retrieval model influence Improved suggestion ranking

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 27

slide-33
SLIDE 33

What we have (not) done

Results

Keyqueries for scholarly search Keyquery cover from keyphrases Query baseline outperformed On a par with Google Scholar and Sofia Search Combination is best

Future Work

Efficiency Other topics and corpora Retrieval model influence Improved suggestion ranking

Hagen, Beyer, Gollub, Komlossy, Stein Supporting Scholarly Search with Keyqueries 27

slide-34
SLIDE 34

What we have (not) done

Results

Keyqueries for scholarly search Keyquery cover from keyphrases Query baseline outperformed On a par with Google Scholar and Sofia Search Combination is best

Future Work

Efficiency Other topics and corpora Retrieval model influence Improved suggestion ranking

Thank you

  • Hagen, Beyer, Gollub, Komlossy, Stein

Supporting Scholarly Search with Keyqueries 27