Tasks at Past NTCIRs - - PowerPoint PPT Presentation

tasks at past ntcirs
SMART_READER_LITE
LIVE PREVIEW

Tasks at Past NTCIRs - - PowerPoint PPT Presentation

NTCIR: NII Testbeds and Community for Information access Research Research Infrastructure for Evaluating IA A series of evaluation workshops designed to enhance research in information-access technologies by providing an Data sets, evaluation


slide-1
SLIDE 1

NTCIR: NII Testbeds and Community for Information access Research

Research Infrastructure for Evaluating IA

Data sets (Test collections or TCs) Project started in late 1997

Once every 18 months

A series of evaluation workshops designed to enhance research in information-access technologies by providing an infrastructure for large-scale evaluations.

■Data sets, evaluation methodologies, and forum

109 119 77 85 82 65 90 15 12 15 17 17 NTCIR-5 NTCIR-6 NTCIR-7 NTCIR-8 NTCIR-9 # of countries

1

Data sets (Test collections or TCs)

Scientific, news, patents, web, CQA, Wikipedia, Entrance Exams Chinese, Korean, Japanese, and English

Tasks (Research Areas)

IR: Cross-lingual tasks, patents, web, Geo QA:Monolingual tasks, cross-lingual tasks Summarization, trend info., patent maps Opinion analysis, text mining

Community-based Research Activities

28 36 65 74 77 6 8 9 10 15 50 100 150 NTCIR-1 NTCIR-2 NTCIR-3 NTCIR-4 NTCIR-5 # of Active participant groups # of registered

slide-2
SLIDE 2

■ ■ ■ ■

■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■

■ ■ ■

■ ■ ■ ■ ■ ■ ■

■ ■ ■ ■ ■ ■ ■

■ ■ ■ ■ ■ ■ ■

■ ■ ■ ■ ■ ■ ■ ! ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ □ □ □ □ " ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ □ □ □ □ #! ■ ■ ■ ■ ■ ■ ■ ■

■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■

■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ $%&

■ ■ ■ ■ ■ ■ ■ Cross Link Discovery

  • ' %
  • Module-Based

IR for Focused Domain

  • (

Tasks at Past NTCIRs

■ ■ ■ ■ ■ ■ ■

■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ !)# ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ !%*+ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ !+ Interactive ■ ■ ■ ■ *+ ■ ■ ■ ■ ■ ■ ■ ■ ) ■ ■ ■ ■ ■ ■ ■ ■ " ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ,- Crosslingual ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ )! ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■

■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ .// Text Retrieval ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ %0& (/%2!/%34/- Summarization / Consolidation ,-

2

slide-3
SLIDE 3

NTCIR-9/10: Objectives

  • Solid foundation

– New structure

  • Task diversity

– Covers a wide context

  • Community-led task
  • rganisation

– Sustainability of research – Covers a wide context in Information Access – Studies rich media types – Seed Funding

  • Promotion of research

resources

– Show case in the NTCIR Meeting

3

slide-4
SLIDE 4

.!5検索システムの力くらべ

4

http://research.nii.ac.jp/ntcir/

4

slide-5
SLIDE 5

NTCIR-10: Structure

  • General Co-Chairs

– Tsuneaki Kato (U Tokyo) – Noriko Kando (NII) – Douglas W. Oard

  • Task Organizers

– 48 researchers all over the world – Particiapnts (you!)

TO

– Douglas W. Oard (University of Maryland) – Mark Sanderson (RMIT)

  • Program Co-Chairs

– Hideo Joho (U Tsukuba) – Tetsuya Sakai (MSRA)

  • EVIA 2013 Co-Chairs

– Ruihua Song (MSRA) – William Webber (University of Maryland)

5

slide-6
SLIDE 6

6

http://research.nii.ac.jp/ntcir/ntcir-10/organizers.html

slide-7
SLIDE 7

NTCIR-10 Program Committe

  • Charles Clarke (University of Waterloo, Canada)
  • Kalervo Järvelin (University of Tampere, Finland)
  • Hideo Joho (Co-chair, University of Tsukuba, Japan)
  • Gareth Jones (Dublin City University, Ireland)

Noriko Kando (NII, Japan)

  • Noriko Kando (NII, Japan)
  • Tsuneaki Kato (The University of Tokyo, Japan)
  • Douglas W. Oard (University of Maryland)
  • Tetsuya Sakai (Co-chair, Microsoft Research Asia, PRC)
  • Mark Sanderson (RMIT, Austraria)
  • Ian Soboroff (NIST, US)

7

slide-8
SLIDE 8

NTCIR-9 Tasks

CORE TASKS

  • [Intent] Intent (with One-Click Access)

– Subtask 1 CLICK -> CORE

  • [RITE] Recognizing Inference in Text

[GeoTime] Geotemporal information retrieval (x)

  • [GeoTime] Geotemporal information retrieval (x)
  • [SpokenDoc] IR for Spoken Documents

PILOT TASKS

  • [CrossLink] Cross-lingual Link Discovery -> Core
  • [Vis-Ex] Interactive Visual Exploration (x)
  • [PatentMT] Patent Machine Translation -> Core

8

slide-9
SLIDE 9

NTCIR-10 Accepted Tasks

Core

  • [Intent-2] Search intent and diversification
  • [1Click-2] One-Click Access
  • [RITE-2] Recognizing Inference in Text
  • [SpokenDoc-2] IR for Spoken Documents
  • [PatentMT-2] Cross-lingual access to Patent Docs
  • [CrossLink-2] Cross-lingual Link Discovery

Pilot

  • [Math] Access to mathematical contents

9

slide-10
SLIDE 10

Cross-lingual Link Discovery

… Ranked third in the Index

  • f

Economic Freedom (2010),[178] Australia is the world's thirteenth largest economy and has the ninth highest per capita GDP; higher than that of the United Kingdom, Germany, France, Canada, Japan, and the United States. The country was ranked second in the United Nations 2010 Human Development Index and first in Legatum's 2008 Prosperity Index.[179] All of Australia's major cities fare well in global comparative livability surveys;[180] Melbourne reached first place on The Economist's 2011

Cross-lingual Link Discovery

Article: Australia Article: Australia

Links in other languages? New articles? Missing links? Not what we are looking for? What about

  • ther

relevant links? No link was created for this term, for finding articles in languages we prefer traditionally we do: Search Translate

reached first place on The Economist's 2011 World's Most Livable Cities list, followed by Sydney, Perth, and Adelaide in sixth, eighth, and ninth place respectively.[181] Total government debt in Australia is about $190 billion.[182] Australia has among the highest house prices and some

  • f

the highest household debt levels in the world. …

  • All about multi-lingual knowledge discovery in knowledge bases (e.g. Wikipedia)
  • All about easy and efficient information access

Cross-lingual Links New Links Better Links More options Cross-lingual Link Discovery

How to automatically create cross-lingual links for a document if no links existing yet?

经济学人 … The Economist … エコノミ スト … 이코노미 스트 …

slide-11
SLIDE 11

INTENT-2: underspecified Query

Harry Potter

slide-12
SLIDE 12

Search Results Diversification

Produce SERP to sartisfy the various usrs’s intentions against the underspecified queries

(SERP)

slide-13
SLIDE 13

INTENT-1 & INTENT-2

INTENT-1@NTCIR-9 INTENT-2@NTCIR-10

INTENT-1 topics INTENT-2 Topics INTENT-1 Systems INTENT-2 Systems

INTENT-1 system test s on INTENT-2 Topics (Test the relationship of the two topic sets) INTENT-2 system tests on INTENT-1 topics (show improvments from INTENT-1

slide-14
SLIDE 14

Traditional Search = More-than-One Click Access

Enter query

Click SEARCH button

湘南厚木病院 (Shonan Atsugi Hospital)

Scan ranked list of URLs

Click URL

Read URL contents Get all desired information

slide-15
SLIDE 15

One Click Access

Enter query

Click SEARCH button

Phone: 046-223-3636. Fax: 046-223-3630. Address: 118-1 Nurumizu, Atsugi, 243-8551. Email: soumu@shonan- atsugi.jp. Visiting hours: general ward Mon-Fri 15-20;

湘南厚木病院 (Shonan Atsugi Hospital)

The system outputs X-string

Get all desired information

15 atsugi.jp. Visiting hours: general ward Mon-Fri 15-20; Sat&Holidays 13-20 / Intensive Care Unit (ICU) 11-11:30, 15:30, 19-19:30.

Go beyond the "ten-blue-link" paradigm in Web search

Phone: 046-223-3636. Fax: 046-223-3630. Address: 118-1 Nurumizu, Atsugi, 243-
  • 8551. Email:
soumu@shonan- atsugi.jp. Visiting hours: general ward Mon-Fri 15-20; Sat&Holidays13- 20 / Intensive Care Unit (ICU) 11-11:30, 15:30, 19-19:30.

Particularly important for

mobile search

slide-16
SLIDE 16

NTCIR-10 RITE-2 タスク (Recognizing Inference in TExt)

Yotaro Watanabe1 Yusuke Miyao2 Junta Mizuno1 Tomohide Shibata3 Cheng- Wei Lee4 Chuan- Jie Lin5

NTCIR-10 Kick-off Event March 8th, 2012

Watanabe Miyao Mizuno Shibata Wei Lee4 Jie Lin5 Teruko Mitamura8

1Tohoku University 2National Institute

  • f Informatics

3Kyoto

University

8Carnegie Mellon

University

4Academia

Sinica

5National Taiwan

Ocean University

Hideki Shima8 Hiroshi Kanayama7 Koichi Takeda7

7IBM Research

Shuming Shi6

6Microsoft

Research Asia

slide-17
SLIDE 17

Three Subtasks in NTCIR-9 RITE

Binary-class subtask

  • Given a text pair <t1 ,t2>, your system will detect whether t1 entails a

hypothesis t2 or not

Multi-class (5-way) subtask

  • Given a text pair <t1 ,t2>, your system detects whether t1 and t2

– Have entailment relation: t1t2 / t1  t2 / t1  t2 – Does not have entailment relation: Contradiction / unkown (cannot be – Does not have entailment relation: Contradiction / unkown (cannot be determined)

RITE4QA subtask

  • Same as the binary-class subtask, but in a Question Answering

scenario

– t2 is a question converted to affirmative statement with a wh-word replaced with an answer candidate. t1 is a sentence/paragraph containing the answer candidate. – What’s the impact of RITE on a practical dataset/task?

Entrance Exam subtask

17

slide-18
SLIDE 18

NTCIR-10 RITE-2

  • Binary-Class (BC)
  • Multi-Class (MC) 4 types
  • Entrance Exam

BC – BC – Retrieval

  • Hypothesis + Whole (Wikipedia + Text book) Corpus
slide-19
SLIDE 19

Spoken Doc Target Speech Data

  • Type of speech data

– Broadcast news speech, podcast, lecture speech lecture speech…

Having noisier words

  • Databases

– CSJ (Corpus of Spontaneous Japanese)

  • 2,702 lecture speeches, 628 hours

– New target! Real Real academic academic meeting lectures meeting lectures collection collection

  • Over 70 speeches from the spoken document processing

workshops

Our target

slide-20
SLIDE 20

Spoken Document Retrieval

  • Ad-hoc Information Retrieval from lecture

speeches

  • Finding the passages including the relevant

information related to a given query topic information related to a given query topic

  • Query

– Text query – Spoken Query (optional)

slide-21
SLIDE 21

Goals of PatentMT

  • To develop challenging and significant practical research into patent

machine translation.

  • To investigate the performance of state-of-the-art machine translation

systems in terms of patent translations involving Japanese, English, and Chinese.

  • To compare the effects of different methods of patent translation by

applying them to the same test data.

  • To create publicly-available parallel corpora of patent documents and

human evaluations of MT results for patent information processing research.

  • To drive machine translation research, which is an important technology

for cross-lingual access of information written in unknown languages.

  • The ultimate goal is fostering scientific cooperation.

21

slide-22
SLIDE 22

Findings of PatentMT at NTCIR-9

  • SMT was the best system for Chinese to English and

English to Japanese patent translation.

– This is the first time for SMT to be demonstrated equal

  • r better quality than that of the top-level RBMT for

English to Japanese patent translation. – The pre-ordering method of NTT-UT for SMT is very – The pre-ordering method of NTT-UT for SMT is very effective for English to Japanese patent translation.

  • 80% of patent sentences could be understood in the

best system for Chinese to English patent translation.

  • RBMT was the best system for Japanese to English

patent translation.

22

slide-23
SLIDE 23

The Goal of NTCIR-10 Math Task

  • NTCIR Math Task aims at exploring methods

for mathematical content access through its task design and the construction of the evaluation dataset.

In science, a formula is a concise way of expressing information, or a general relationship between

  • quantities. (Wikipedia)

[ Formula ] a mathematical relationship or rule expressed in symbols

(Oxford Dictionary)

evaluation dataset.

slide-24
SLIDE 24

NISTEP Policy Study

Mathematics as deserted science in Japanese S&T policy

  • Current situation on mathematical

sciences research in major countries and need for mathematical sciences from the science in Japan (2006.5)

  • Q. Is mathematics related

Math Information Access

Representations Resources

log(z_1)+log(z_2) == log(Z_1,

Character sequence (latex source) Embedded image (png, gif, ...)

mathematical knowledge-base and math ontology

Strict Content MathML (W3C recommendation) OpenMath

Requirement

  • Q. Is mathematics related

to your research?

log(z_1)+log(z_2) == log(Z_1, Z_2)¥¥; z_1+z_2 ¥geq 0

Web-browsable XML

<math xmlns='http://www.w3.org/1998/Math /MathML' mathematica:form='TraditionalForm' xmlns:mathematica='http://www.wolfram.com/XML /'> <semantics> <mrow> <mrow> <mrow> <mrow> <mi> log </mi> <mo> &#8289; </mo> <mo> ( </mo> <msub> <mi> z </mi> <mn> 1 </mn> </msub>

XML for math semantics

<annotation-xml encoding='MathML-Content'> <apply> <ci> Condition </ci> <apply> <eq /> <apply> <plus /> <apply> <ln /> <apply> <ci> Subscript </ci> <ci> z </ci> <cn type='integer'> 1 </cn> </apply> </apply> <apply> <ln /> <apply> <ci> Subscript </ci> Wikipedia: 26,566 mathematics articles Wolfram MathWorld : 13,081 entries (Sep. 13, 2011) Wolfram Functions site : 307,409 formulas (Sep. 15, 2011)

77% researchers across diversity of disciplines answered ‘YES’.

Somewhat related Related Strongly related

slide-25
SLIDE 25

NTCIR-9 pilot VisEXTask Outline

Browser (Log Collection) Editor IAES Core Log Collection Experimental Tasks Organizer Provide Framework Provide Baseline Mainly by the Organizer Log Collection

  • Info. Retrieval

Engine Documents Display

  • etc. Func.

… Laboratory Experiments Human Subjects Participants It is important to discuss the followings through the WS Submit

  • I/F between an IAES core and the framework
  • Taxonomy of process primitives
  • Detailed design of the laboratory experiments
slide-26
SLIDE 26

Infrastructure Infrastructure

Work task, roles Work task, roles Interaction Interaction

NTCIR’sGrand Challenge

Intent

Impact to real challenges in

  • ur society

Interaction Interaction

Between- document Between- document Document structure Document structure

RITE PatentMT CrossLink Vis-Ex

SpokenDoc

GeoTime

News Web Legal Speech Time Time

NTCIR’s Infrastructure for IA Evaluation

slide-27
SLIDE 27

Special Issues

  • Diversified Search (IRJ)
  • RITE (TALIP)
  • LT for IA (NLPJ)

PATENT (IRJ)

  • PATENT (IRJ)

27

slide-28
SLIDE 28

Thank you for your attention!

http://research.nii.ac.jp/ntcir/ntcir-10/ Registration is still open Conference : 18-21 June 2013 EVIA : 18 June 2013

Q & A

Thank you for your attention!

For further enquiries, contact the NTCIR office ntc-secretariat nii.ac.jp

28