WebCLEF 2007 The Overview Valentin Jijkoun, Maarten de RIjke - - PowerPoint PPT Presentation

webclef 2007 the overview
SMART_READER_LITE
LIVE PREVIEW

WebCLEF 2007 The Overview Valentin Jijkoun, Maarten de RIjke - - PowerPoint PPT Presentation

WebCLEF 2007 The Overview Valentin Jijkoun, Maarten de RIjke Overview Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 The Overview 2 Overview A bit of history Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 The Overview 2


slide-1
SLIDE 1

WebCLEF 2007 — The Overview

Valentin Jijkoun, Maarten de RIjke

slide-2
SLIDE 2

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Overview

2

slide-3
SLIDE 3

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Overview

 A bit of history

2

slide-4
SLIDE 4

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Overview

 A bit of history  Task description

2

slide-5
SLIDE 5

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Overview

 A bit of history  Task description  Assessment

2

slide-6
SLIDE 6

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Overview

 A bit of history  Task description  Assessment  Evaluation measures

2

slide-7
SLIDE 7

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Overview

 A bit of history  Task description  Assessment  Evaluation measures  Runs

2

slide-8
SLIDE 8

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Overview

 A bit of history  Task description  Assessment  Evaluation measures  Runs  Results

2

slide-9
SLIDE 9

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Overview

 A bit of history  Task description  Assessment  Evaluation measures  Runs  Results  Conclusion

2

slide-10
SLIDE 10

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — A bit of history

3

slide-11
SLIDE 11

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — A bit of history

 Launched as a known-item search task in 2005,

repeated in 2006

  • Resources created used for a number of purposes

3

slide-12
SLIDE 12

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — A bit of history

 Launched as a known-item search task in 2005,

repeated in 2006

  • Resources created used for a number of purposes

 But there are information needs out there beside

navigational ones, even on the web

3

slide-13
SLIDE 13

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — A bit of history

 Launched as a known-item search task in 2005,

repeated in 2006

  • Resources created used for a number of purposes

 But there are information needs out there beside

navigational ones, even on the web

3

slide-14
SLIDE 14

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — A bit of history

 Launched as a known-item search task in 2005,

repeated in 2006

  • Resources created used for a number of purposes

 But there are information needs out there beside

navigational ones, even on the web

3

slide-15
SLIDE 15

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — A bit of history

 Launched as a known-item search task in 2005,

repeated in 2006

  • Resources created used for a number of purposes

 But there are information needs out there beside

navigational ones, even on the web

3

slide-16
SLIDE 16

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — A bit of history

 Launched as a known-item search task in 2005,

repeated in 2006

  • Resources created used for a number of purposes

 But there are information needs out there beside

navigational ones, even on the web

3

slide-17
SLIDE 17

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — A bit of history

 Launched as a known-item search task in 2005,

repeated in 2006

  • Resources created used for a number of purposes

 But there are information needs out there beside

navigational ones, even on the web

3

slide-18
SLIDE 18

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — A bit of history

 Launched as a known-item search task in 2005,

repeated in 2006

  • Resources created used for a number of purposes

 But there are information needs out there beside

navigational ones, even on the web

 WiQA

  • Pilot that ran at QA@CLEF 2006
  • Question answering using Wikipedia
  • Unidirected informational queries: “Tell me about X”

3

slide-19
SLIDE 19

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Task description

4

slide-20
SLIDE 20

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Task description

 Wishes

  • Task close to Real-World™ information need
  • Clear definition of a user
  • Multi-linguality should come naturally
  • Collections should be a natural source
  • Collections, topics, assessors’ judgments re-usable
  • Challenging

4

slide-21
SLIDE 21

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Task description

 Wishes

  • Task close to Real-World™ information need
  • Clear definition of a user
  • Multi-linguality should come naturally
  • Collections should be a natural source
  • Collections, topics, assessors’ judgments re-usable
  • Challenging

 Our hypothetical user

  • “A knowledgeable person, writing a survey or overview with a clear goal

and audience in mind.”

  • Locate items of information to be included in the article to be written, and

use an automatic system to support this

  • Use online resources only

4

slide-22
SLIDE 22

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Task description (2)

5

slide-23
SLIDE 23

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Task description (2)

 User formulates her information need (“topic”)

  • A short topic title (e.g., title of the survey article)
  • A free text description of the goals and intended audience
  • A list of languages in which the user is willing to accept results
  • Optional list of known source (URLs of docs the user considers relevant)
  • Optional list of Google retrieval queries

5

slide-24
SLIDE 24

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Task description (2)

 User formulates her information need (“topic”)

  • A short topic title (e.g., title of the survey article)
  • A free text description of the goals and intended audience
  • A list of languages in which the user is willing to accept results
  • Optional list of known source (URLs of docs the user considers relevant)
  • Optional list of Google retrieval queries

 Example

  • title: Significance testing
  • description: I want to write a survey (about 10 screens) or undergraduate

students on statistical significance testing, with an overview of the ideas, common ideas and critiques. I will assume some basic knowledge of statistics

  • language(s): English
  • known sources: http://en.wikipedia.org/wiki/Statistical_hypothesis_testing ..
  • retrieval queries: significance testing ; site:mathworld.wolfram.com ; ...

5

slide-25
SLIDE 25

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Task description (3)

6

slide-26
SLIDE 26

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Task description (3)

 Data

  • Close to Real-World™ scenario, but tractable
  • Define collection per topic
  • “Mashup”
  • All “known” sources specified
  • Top 1000 results per retrieval queries
  • Per result: query that retrieved it, rank, conversion (of HTML, PDF, PS) to plain

txt

6

slide-27
SLIDE 27

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Task description (3)

 Data

  • Close to Real-World™ scenario, but tractable
  • Define collection per topic
  • “Mashup”
  • All “known” sources specified
  • Top 1000 results per retrieval queries
  • Per result: query that retrieved it, rank, conversion (of HTML, PDF, PS) to plain

txt

 System’s response

  • Ranked list of plain txt snippets extracted from the sub-collection of the

topic

  • Each indicates its origin

6

slide-28
SLIDE 28

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Assessment

7

slide-29
SLIDE 29

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Assessment

 Manual assessment by the topic creators

  • Somewhat similar to OTHER questions at TREC 2006
  • Blind
  • Pool responses of all systems into anonymized sequence of txt segments
  • For each response only include first 7,000 chars

7

slide-30
SLIDE 30

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Assessment

 Manual assessment by the topic creators

  • Somewhat similar to OTHER questions at TREC 2006
  • Blind
  • Pool responses of all systems into anonymized sequence of txt segments
  • For each response only include first 7,000 chars

 Asessor was asked …

  • To create a list of nuggets (“atomic facts”) that should be included in the

article for the topic

  • Link character spans from a response to nugget
  • Different spans within a single snippet may be linked to multiple nuggets
  • Mark as “known” if a span expresses a fact present in known source

7

slide-31
SLIDE 31

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview 8

slide-32
SLIDE 32

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview 8

slide-33
SLIDE 33

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview 8

slide-34
SLIDE 34

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview 8

slide-35
SLIDE 35

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview 8

slide-36
SLIDE 36

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Assessment (3)

9

slide-37
SLIDE 37

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Assessment (3)

 Similar to INEX and some TREC tasks, assessment

carried out by topic creators

9

slide-38
SLIDE 38

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Evaluation measures

10

slide-39
SLIDE 39

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Evaluation measures

 Based on standard precision and recall

10

slide-40
SLIDE 40

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Evaluation measures

 Based on standard precision and recall  For a given response R (ranked list of snippets) of a

system S for topic T, define

  • recall is the sum of character lengths of all spans in R linked to nuggets,

divided by total sum of span lengths

  • precision is the number of characters that belong to at least one span

linked to a nugget, divided by total character length of R

10

slide-41
SLIDE 41

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Runs

11

slide-42
SLIDE 42

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Runs

 What did people try?

  • DCU
  • Sentence-based snippets. Multiple ways of re-ranking of snippets: (1) word
  • verlap with topic, description, known source; (2) word overlap plus

thresholding; (3) compare parses of known sources with parses of snippets.

  • UIndonesia
  • USAL
  • Fixed size text windows (1500 bytes). Focus on segmentation (“snippet

generation”); ranking based on structured queries (topic, description, anchor text, the vocabulary from the “known sources”)

  • UvA
  • Sentence-based and paragraph-based snippets. Centrality scores plus

penalties for overlap with known sources (see next talk).

11

slide-43
SLIDE 43

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Runs (2)

12

slide-44
SLIDE 44

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Runs (2)

 Four groups submitted 12 runs

12

slide-45
SLIDE 45

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Runs (2)

 Four groups submitted 12 runs  Baseline: Google

  • Ranked list of at most 1,000 snippets

12

slide-46
SLIDE 46

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Runs (2)

 Four groups submitted 12 runs  Baseline: Google

  • Ranked list of at most 1,000 snippets

 Disclaimer

  • Google’s web search engine not designed for WebCLEF 2007 (but for web

page finding)

  • Don’t interpret the baseline as assessment or comment of Google as a

web search engine

12

slide-47
SLIDE 47

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Results

13

slide-48
SLIDE 48

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Results

 Baseline plus P/R values at three cut-off points

13

slide-49
SLIDE 49

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Results (2)

14

slide-50
SLIDE 50

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Results (2)

 UVA par vs and UVA par wo best performance across

all cut-off points

14

slide-51
SLIDE 51

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Results (2)

 UVA par vs and UVA par wo best performance across

all cut-off points

 USAL reina0.25 and USAL reina1 have comparable

performance

14

slide-52
SLIDE 52

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Results (2)

 UVA par vs and UVA par wo best performance across

all cut-off points

 USAL reina0.25 and USAL reina1 have comparable

performance

14

slide-53
SLIDE 53

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Results (2)

 UVA par vs and UVA par wo best performance across

all cut-off points

 USAL reina0.25 and USAL reina1 have comparable

performance

 Note: precision grows as the cut-off point increases

  • Systems manage to find relevant snippets, but the ranking is far from
  • ptimal

14

slide-54
SLIDE 54

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Results (2)

 UVA par vs and UVA par wo best performance across

all cut-off points

 USAL reina0.25 and USAL reina1 have comparable

performance

 Note: precision grows as the cut-off point increases

  • Systems manage to find relevant snippets, but the ranking is far from
  • ptimal

14

slide-55
SLIDE 55

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Conclusions

15

slide-56
SLIDE 56

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Conclusions

 WebCLEF 2007

  • New task
  • Aimed at undirected informational search goals (“Tell me about X.”)
  • Most submitted runs outperformed the Google-based baseline

15

slide-57
SLIDE 57

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Conclusions

 WebCLEF 2007

  • New task
  • Aimed at undirected informational search goals (“Tell me about X.”)
  • Most submitted runs outperformed the Google-based baseline

 Work left to be done

  • Detailed error analysis
  • Refine assessor guidelines
  • Refine evaluation interface
  • Not enough topics?

15

slide-58
SLIDE 58

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Conclusions

 WebCLEF 2007

  • New task
  • Aimed at undirected informational search goals (“Tell me about X.”)
  • Most submitted runs outperformed the Google-based baseline

 Work left to be done

  • Detailed error analysis
  • Refine assessor guidelines
  • Refine evaluation interface
  • Not enough topics?

 Thanks

  • Participants and assessors

15

slide-59
SLIDE 59

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — The Future?

16

slide-60
SLIDE 60

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — The Future?

 Why does the most interesting task at CLEF have so

few participants?

16

slide-61
SLIDE 61

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — The Future?

 Why does the most interesting task at CLEF have so

few participants?

16

slide-62
SLIDE 62

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — The Future?

 Why does the most interesting task at CLEF have so

few participants?

 The track will only continue if the number of

participants is sufficiently

16

slide-63
SLIDE 63

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — The Future?

 Why does the most interesting task at CLEF have so

few participants?

 The track will only continue if the number of

participants is sufficiently

16

slide-64
SLIDE 64

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — The Future?

 Why does the most interesting task at CLEF have so

few participants?

 The track will only continue if the number of

participants is sufficiently

 To help you prepare for next year

  • 2007 topic, documents, qrels
  • Code of the best performing 2007 system freely available (as a baseline)

16

slide-65
SLIDE 65

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

WebCLEF — The Future?

 Why does the most interesting task at CLEF have so

few participants?

 The track will only continue if the number of

participants is sufficiently

 To help you prepare for next year

  • 2007 topic, documents, qrels
  • Code of the best performing 2007 system freely available (as a baseline)

 More at the breakout session

  • (12:00–13:00, this room)

16

slide-66
SLIDE 66

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview 17

slide-67
SLIDE 67

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview 17

slide-68
SLIDE 68

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Tips

18

slide-69
SLIDE 69

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Tips

 Look at summarization

  • Evaluation measures
  • Especially last two years

18

slide-70
SLIDE 70

Valentin Jijkoun, Maarten de Rijke/WebCLEF 2007 — The Overview

Tips

 Look at summarization

  • Evaluation measures
  • Especially last two years

 Passages vs paragraphs

  • 18