Overview of NTCIR-14 Makoto P. Kato Yiqun Liu University of - - PowerPoint PPT Presentation

▶

Oct 10, 2022 449 likes •626 views

Overview of NTCIR-14 Makoto P. Kato Yiqun Liu University of Tsukuba Tsinghua University Introduction to NTCIR Evaluation Forum An opportunity for researchers to get together to solve challenging research problems based on corporation

SLIDE 1

Overview of NTCIR-14

Makoto P. Kato University of Tsukuba Yiqun Liu Tsinghua University

SLIDE 2

Evaluation Forum

–An opportunity for researchers to get together to solve challenging research problems based on corporation of task organizers and participants:

Task organizers design tasks, prepare test collections,

and evaluate participants’ systems

Participants develop systems to achieve better

performances in the tasks

Introduction to NTCIR

Task organizers

Participants Tasks Test collections Evaluation results System output

SLIDE 3

Task organizers

– Can obtain many findings on a certain problem – Can share workload of building a large-scale test collection – Can get attention to a certain research direction

Participants

– Can focus on solving problems – Can tackle well-recognized problems with new resources, or novel problems at an early stage – Can demonstrate the performance of their developed systems in a fair comparison

Both sides should have benefits

Benefits of Evaluation Forums

SLIDE 4

PC co-chairs requested PC members to review task

proposals from task organizers

PC co-chairs made decision based on the reviews

Task Selection Procedure

Task organizers

PC members

Task proposals

PC co-chairs Program committee

Reviews Decision

Please consider task proposals at NTCIR!

SLIDE 5

NTCIR-14 Program Committee (PC)

Ben Carterette

University of Delaware, USA Hsin-Hsi Chen National Taiwan University, Taiwan Tat-Seng Chua National University of Singapore, Singapore Nicola Ferro University of Padova, Italy Kalervo Järvelin University of Tampere, Finland Gareth J. F. Jones Dublin City University, Ireland Mandar Mitra Indian Statistical Institute, India Douglas W. Oard University of Maryland, USA Maarten de Rijke University of Amsterdam, the Netherlands Tetsuya Sakai Waseda University, Japan Mark Sanderson RMIT University, Australia Ian Soboroff NIST, USA Emine Yilmaz University College London, United Kingdom

SLIDE 6

7+2 NTCIR-14 task proposals, each of

which was reviewed by 4 or more PC members

6+1 tasks were accepted, of which 5 are

core tasks and 1+1 are pilot tasks

Review Process

SLIDE 7

NTCIR-14 General Schedule

Note that each task could have its own schedule

Date Event Mar 20, 2018 NTCIR-14 Kickoff May 15, 2018 Task Registration Due Jun 2018 Dataset Release Jun-Jul 2018 Dry Run Aug-Oct 2018 Formal Run Feb 1, 2019 Evaluation Result Release Feb 1, 2019 Task overview paper release (draft) Mar 15, 2019 Submission due of participant papers May 1, 2019 Camera-ready participant paper due Jun 2019 NTCIR-14 Conference & EVIA 2019 in NII, Tokyo

SLIDE 8

Focuses of NTCIR-14

1.Heterogeneous information access 2.Dialogue generation and analysis 3.Meta research on information access communities

SLIDE 9

OpenLiveQ WWW

Search for questions Search for web pages

Lifelog

Search for lifelog data

CENTRE

Reproduce the best practices

C… A! B?

Search Summarize Generate Understand

Dialog data

Reproduce

Heterogeneous data

QALab STC

Summarize dialog data Generate dialogues

FinNum

Understand numeric info.

SLIDE 10

Core Tasks

–Lifelog-3: (Lifelog Serach Task) –OpenLiveQ-2: (Open Live Test for Question Retrieval) –QALab-PoliInfo: (Question Answering Lab for Political Information) –STC-3: (Short Text Conversation) –WWW-2: (We Want Web)

Pilot Tasks

–CENTRE: (CLEF/NTCIR/TREC REproducibility) –FinNum: (Fine-Grained Numeral Understanding in Financial Tweet) NTCIR-14 Tasks

SLIDE 11

Number of Active Participants

Task

# QA Lab for Entrance Exam (QALab) (11, 12, 13) → QA Lab for Political Information (QALab-PoliInfo) (14) 13 Personal Lifelog Organisation & Retrieval (Lifelog) (12, 13, 14) 6 Short Text Conversation (STC) (12, 13, 14) 13 Open Live Test for Question Retrieval (OpenLiveQ) (13, 14) 4 We Want Web (WWW) (13, 14) 4 Fine-Grained Numeral Understanding in Financial Tweet (FinNum) (14) 6 CLEF/NTCIR/TREC REproducibility (CENTRE) (14) 1 Total 47

Active participants: Research groups submitted final results for evaluation

SLIDE 12

Jargon: Test Collection

Topics

Document collection Relevance judgements Search system Indexed Input Output Evaluate IR test collection Input Expected output General test collection

1 1

Highly relevant Irrelevant

SLIDE 13

Training set: can be used to tune parameters in the system
Dev. set: can be used to tune hyper-parameters in the system
Test set: cannot be used to tune the system, but can only be

used for evaluating the output.

Jargon: Training, Development, and Test Sets

Training

Dev. Test System Output Evaluate Train Train

SLIDE 14

Run:

A result of a single execution of a developed

system. e.g. This team submitted a run.
Dry run:

A preliminary trial for improving the task design and familiarizing participants with the task

Formal run:

An actual trial where submissions and their results are officially recorded

Jargons: Run / Dry Run / Formal Run

SLIDE 15

A measure of the system performance

–General evaluation metrics –IR evaluation metrics: MAP, nDCG, ERR, Q-measure –Summarization evaluation metrics: ROUGE Jargon: Evaluation Metric

Judge

Assessor

! "

Output System System output Correct items

# = |! ∩ "| |"| ' = |! ∩ "| |!| (

) = 2#'

# + '

Precision Recall F1-measure

Please Google or Bing for details. They will be used in the overview presentations.

SLIDE 16

Keynote (TODAY)
Task Overviews (TODAY)
Invited Talks (TODAY)
Task Sessions (DAY-3 and DAY-4)
Poster Sessions (DAY-3 and DAY-4)
Banquet (DAY-3)
Panel (DAY-4)
Break-out Sessions (DAY-4)

ENJOY THE CONFERENCE!