Overview of the 2019 Open-Source IR Replicability Challenge (OSIRRC - - PowerPoint PPT Presentation

▶

Dec 07, 2023 208 likes •553 views

Overview of the 2019 Open-Source IR Replicability Challenge (OSIRRC 2019) Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu Vision Source: saveur.com Vision The ultimate candy store for information retrieval

SLIDE 1

Overview of the 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)

Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu

SLIDE 2

Vision

Source: saveur.com

SLIDE 3

Vision

The ultimate candy store for information retrieval researchers!

Source: Wikipedia (Candy)

SLIDE 4

Vision

The ultimate candy store for information retrieval researchers! See a result you like? Click a button to recreate those results!

Really, any result?

(not quite… let’s start with batch ad hoc retrieval experiments on standard test collections)

What is this, really?

SLIDE 5

Repeatability: you can recreate your own results again Reproducibility: others can recreate your results (with code they rewrite) Replicability: others can recreate your results (with your code)

ACM Artifact Review and Badging Guidelines

We get this “for free” Stepping stone… Our focus

SLIDE 6

Why is this important?

Good science Sustained cumulative progress

Armstrong et al. (CIKM 2009): Little empirical progress made from 1998 to 2009 Why? researchers compare against weak baselines Yang et al. (SIGIR 2019): Researchers still compare against weak baselines

SLIDE 7

How do we get there?

Open-Source Code!

A g

s t a r t , b u t f a r f r

e n

g h …

TREC 2015 “Open Runs”

Voorhees et al. Promoting Repeatability Through Open Runs. EVIA 2016.

79 submitted runs…

SLIDE 8

Voorhees et al. Promoting Repeatability Through Open Runs. EVIA 2016.

Number of runs successfully replicated

SLIDE 9

How do we get there?

Open-Source Code!

A g

s t a r t , b u t f a r f r

e n

g h …

Ask developers to show us how!

Open-Source IR Reproducibility Challenge (OSIRRC), SIGIR 2015 Workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR)

Participants contributed end-to-end scripts for replicating ad hoc retrieval experiments

Lin et al. Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge. ECIR 2016.

SLIDE 10

7 participating systems, GOV2 collection

0.00 0.25 0.50 0.75 T e r r i e r : B M 2 5 G a l a g

Q L J A S S : 2 . 5 M P I n d r i : Q L M G 4 J : B J A S S : 1 B P A T I R E : Q u a n t . B M 2 5 A T I R E : B M 2 5 M G 4 J : B + G a l a g

S D M I n d r i : S D M T e r r i e r : D P H + P r

S D M G 4 J : B M 2 5 T e r r i e r : D P H L u c e n e : B M 2 5 ( P

. ) L u c e n e : B M 2 5 ( C

n t ) T e r r i e r : D P H + B

Q E

System / Model MAP

System Effectiveness

SLIDE 11

7 participating systems, GOV2 collection

1 10 100 1,000 10,000 100,000 J A S S : 2 . 5 M P M G 4 J : B J A S S : 1 B P M G 4 J : B + A T I R E : Q u a n t . B M 2 5 L u c e n e : B M 2 5 ( C

n t ) L u c e n e : B M 2 5 ( P

. ) A T I R E : B M 2 5 M G 4 J : B M 2 5 T e r r i e r : B M 2 5 T e r r i e r : D P H G a l a g

Q L T e r r i e r : D P H + P r

S D I n d r i : Q L T e r r i e r : D P H + B

Q E G a l a g

S D M I n d r i : S D M

System / Model Search Time (ms)

System Efficiency

SLIDE 12

7 participating systems, GOV2 collection

ATIRE: BM25 ATIRE: Quant. BM25 Galago: QL Galago: SDM Indri: QL Indri: SDM JASS: 1B P JASS: 2.5M P Lucene: BM25 (Count) Lucene: BM25 (Pos.) MG4J: B MG4J: B+ MG4J: BM25 Terrier: BM25 Terrier: DPH Terrier: DPH+Bo1 QE Terrier: DPH+Prox SD

100 1000 10000 . 2 8 . 3 . 3 2 . 3 4

MAP Time (ms)

Effectiveness/Efficiency Tradeoff

SLIDE 13

How do we get there?

Open-Source Code!

A g

s t a r t , b u t f a r f r

e n

g h …

Ask developers to show us how!

I t w

k e d , b u t …

SLIDE 14

What worked well?

We actually pulled it off!

What didn’t work well?

Technical infrastructure was brittle Replication scripts too under-constrained

SLIDE 15

Infrastructure

Source: Wikipedia (Burj Khalifa)

SLIDE 16

VMs

VM OS App VM OS App

hypervisor

Physical Machine

SLIDE 17

Containers

Physical Machine Container Container

Container Engine

OS App App

SLIDE 18

Infrastructure

Source: Wikipedia (Burj Khalifa)

SLIDE 19

1. Develop common Docker specification for capturing

ad hoc retrieval experiments – the “jig”.

2. Build a library of curate images that work with the jig.
3. Take over the world!

(encourage adoption, broaden to other tasks, etc.)

Workshop Goals

SLIDE 20

jig Docker image

<snapshot> <image>:<tag> Creates snapshot run files Triggers hook with snapshot <image>:<tag>

prepare phase

User specifies

search phase

init hook index hook search hook Triggers hook Starts image Triggers hook trec_eval

SLIDE 21

Source: Flickr (https://www.flickr.com/photos/m00k/15789986125/)

SLIDE 22

17 images 13 different teams

Focus on newswire collections: Robust04, Core17, Core18 Official runs on Microsoft Azure T h a n k s M i c r

t f

f r e e c r e d i t s !

SLIDE 23

Anserini (University of Waterloo) Anserini-bm25prf (Waseda University) ATIRE (University of Otago) Birch (University of Waterloo) Elastirini (University of Waterloo) EntityRetrieval (Ryerson University) Galago (University of Massachusetts) ielab (University of Queensland) Indri (TU Delft) IRC-CENTRE2019 (T echnische Hochschule Köln) JASS (University of Otago) JASSv2 (University of Otago) NVSM (University of Padua) OldDog (Radboud University) PISA (New York University and RMIT University) Solrini (University of Waterloo) T errier (TU Delft and University of Glasgow)

SLIDE 24

Robust04

49 runs from 13 images

Images captured diverse models: query expansion and relevance feedback conjunctive and efficiency-oriented query processing neural ranking models

SLIDE 25

Core17

12 runs from 6 images

SLIDE 26

Core18

19 runs from 4 images

SLIDE 27

Robust04

49 runs from 13 images

SLIDE 28

Who won?

Source: Time Magazine

SLIDE 29

But it’s not a competition!

Source: Washington Post

SLIDE 30

TREC best – 0.333 TREC median (title) – 0.258

SLIDE 31

SLIDE 32

1. Develop common Docker specification for capturing

ad hoc retrieval experiments – the “jig”.

2. Build a library of curate images that work with the jig.
3. Take over the world!

(encourage adoption, broaden to other tasks, etc.)

Workshop Goals

✓ ✓

?

SLIDE 33

Source: flickr (https://www.flickr.com/photos/39414578@N03/16042029002)

Vision

Vision

Vision

What is this, really?

Why is this important?

Good science Sustained cumulative progress

How do we get there?

How do we get there?

How do we get there?

What worked well?

What didn’t work well?

Infrastructure

VMs

Containers

Infrastructure

Workshop Goals

17 images 13 different teams

Robust04

49 runs from 13 images

Core17

12 runs from 6 images

Core18

19 runs from 4 images

Robust04

49 runs from 13 images

Who won?

But it’s not a competition!

Workshop Goals

✓ ✓

?

What’s next?