Overview of NTCIR-14
Makoto P. Kato University of Tsukuba Yiqun Liu Tsinghua University
Overview of NTCIR-14 Makoto P. Kato Yiqun Liu University of - - PowerPoint PPT Presentation
Overview of NTCIR-14 Makoto P. Kato Yiqun Liu University of Tsukuba Tsinghua University Introduction to NTCIR Evaluation Forum An opportunity for researchers to get together to solve challenging research problems based on corporation
Makoto P. Kato University of Tsukuba Yiqun Liu Tsinghua University
–An opportunity for researchers to get together to solve challenging research problems based on corporation of task organizers and participants:
and evaluate participants’ systems
performances in the tasks
Introduction to NTCIR
Participants Tasks Test collections Evaluation results System output
– Can obtain many findings on a certain problem – Can share workload of building a large-scale test collection – Can get attention to a certain research direction
– Can focus on solving problems – Can tackle well-recognized problems with new resources, or novel problems at an early stage – Can demonstrate the performance of their developed systems in a fair comparison
Benefits of Evaluation Forums
proposals from task organizers
Task Selection Procedure
PC members
Task proposals
PC co-chairs Program committee
Reviews Decision
Please consider task proposals at NTCIR!
NTCIR-14 Program Committee (PC)
University of Delaware, USA Hsin-Hsi Chen National Taiwan University, Taiwan Tat-Seng Chua National University of Singapore, Singapore Nicola Ferro University of Padova, Italy Kalervo Järvelin University of Tampere, Finland Gareth J. F. Jones Dublin City University, Ireland Mandar Mitra Indian Statistical Institute, India Douglas W. Oard University of Maryland, USA Maarten de Rijke University of Amsterdam, the Netherlands Tetsuya Sakai Waseda University, Japan Mark Sanderson RMIT University, Australia Ian Soboroff NIST, USA Emine Yilmaz University College London, United Kingdom
which was reviewed by 4 or more PC members
core tasks and 1+1 are pilot tasks
Review Process
NTCIR-14 General Schedule
Date Event Mar 20, 2018 NTCIR-14 Kickoff May 15, 2018 Task Registration Due Jun 2018 Dataset Release Jun-Jul 2018 Dry Run Aug-Oct 2018 Formal Run Feb 1, 2019 Evaluation Result Release Feb 1, 2019 Task overview paper release (draft) Mar 15, 2019 Submission due of participant papers May 1, 2019 Camera-ready participant paper due Jun 2019 NTCIR-14 Conference & EVIA 2019 in NII, Tokyo
Focuses of NTCIR-14
OpenLiveQ WWW
Search for questions Search for web pages
Lifelog
Search for lifelog data
CENTRE
Reproduce the best practices
C… A! B?
Search Summarize Generate Understand
Dialog data
Reproduce
Heterogeneous data
QALab STC
Summarize dialog data Generate dialogues
FinNum
Understand numeric info.
–Lifelog-3: (Lifelog Serach Task) –OpenLiveQ-2: (Open Live Test for Question Retrieval) –QALab-PoliInfo: (Question Answering Lab for Political Information) –STC-3: (Short Text Conversation) –WWW-2: (We Want Web)
–CENTRE: (CLEF/NTCIR/TREC REproducibility) –FinNum: (Fine-Grained Numeral Understanding in Financial Tweet) NTCIR-14 Tasks
Number of Active Participants
# QA Lab for Entrance Exam (QALab) (11, 12, 13) → QA Lab for Political Information (QALab-PoliInfo) (14) 13 Personal Lifelog Organisation & Retrieval (Lifelog) (12, 13, 14) 6 Short Text Conversation (STC) (12, 13, 14) 13 Open Live Test for Question Retrieval (OpenLiveQ) (13, 14) 4 We Want Web (WWW) (13, 14) 4 Fine-Grained Numeral Understanding in Financial Tweet (FinNum) (14) 6 CLEF/NTCIR/TREC REproducibility (CENTRE) (14) 1 Total 47
Active participants: Research groups submitted final results for evaluation
Jargon: Test Collection
Document collection Relevance judgements Search system Indexed Input Output Evaluate IR test collection Input Expected output General test collection
1 1
Highly relevant Irrelevant
used for evaluating the output.
Jargon: Training, Development, and Test Sets
Dev. Test System Output Evaluate Train Train
A result of a single execution of a developed
A preliminary trial for improving the task design and familiarizing participants with the task
An actual trial where submissions and their results are officially recorded
Jargons: Run / Dry Run / Formal Run
–General evaluation metrics –IR evaluation metrics: MAP, nDCG, ERR, Q-measure –Summarization evaluation metrics: ROUGE Jargon: Evaluation Metric
Assessor
Output System System output Correct items
# = |! ∩ "| |"| ' = |! ∩ "| |!| (
) = 2#'
# + '
Precision Recall F1-measure
Please Google or Bing for details. They will be used in the overview presentations.
ENJOY THE CONFERENCE!