[PDF] - 1 NTCI R Tasks of NTCI R- 4 D o c u m e n t s t o p i PDF Document

SLIDE 1

1

ntcir4-ov 2004-06-02 1

Overview of the 4th NTCI R Workshop

Noriko Kando

Nat ional I nst it ut e of I nf ormat ics http:/ / research. nii. ac. jp/ ntcir/ kando@nii.ac.j p

ntcir4-ov 2004-06-02 2

NTCI R Workshop is :

A series of evaluat ion workshops designed t o enhance research in inf ormation access t echnologies by providing inf rast ruct ure of large-scale evaluat ion.

Project started late 1997, Once per 1½ years

1

st : Nov.1,1998- Sept .1,1999

2nd : J une,2000– March,2001 3rd : Sept 2001- Oct 2002 <- results 4th: Apr 2003 – June 2004 <- dry run & progress * Nii Test Collect ion f or I nf ormat ion Ret rieval syst ems

* Co-sponsored by NI I and MEXT Gr ant -in-Aid on I nf or mat ics

ntcir4-ov 2004-06-02 3

Large-scale reusable t est collect ions f or experiment s of inf ormat ion access t echnologies

Purposes of NTCI R

An int ernat ional f orum of researcher groups * Cross-syst em comparison on common evaluat ion inf rast ruct ure * Rapid improvement

f t he ef f ect iveness

and t echnology t ransf er of lat est researches NTCI R enables: I nvest igat ion on evaluat ion met hodologies and met rics

ntcir4-ov 2004-06-02 4

Evaluation Workshops

"evaluation workshop" or "evaluation".
provides to participants a set of data

usable f or experiments and unif ied procedures f or evaluation

– Each participating research group conducts experiments with various approaches and can participate with own purpose.

Successf ul examples; TREC
I mplications are various

ntcir4-ov 2004-06-02 5

Focus of NTCI R

Lab-t ype I R Test New Challenges Forum f or Researchers

Asian Languages/ cross-language Variet y of Genre Parallel/ comparable Corpus I nt ersect ion of I R + NLP To make inf ormat ion in t he document s more usable f or users! Realist ic eval/ user t ask I dea Exchange Discussion/ I nvest igat ion on Evaluat ion met hods/ met rics

ntcir4-ov 2004-06-02 6

Tasks (Research Areas) of NTCI R Workshops

T a s k s Text Summarization Question Answering Term Extraction/ Role Analysis Web Retrieval Patent Retrieval Cross- lingual I R Japanese I R

2nd 3rd 4t h

Nov 98

1st

Apr 2003 About once per 1 ½ years

Project start ed lat e 1997

SLIDE 2

2

ntcir4-ov 2004-06-02 7

NTCI Rテストコレクション

C t ： T r a d i t i

n

a l C h i n e s e 、 C s ： S i m p l i f i e d C h i n e s e 、 K ： K

r

e a n 、 J ： J a p a n e s e 、 E ： E n g l i s h J ( E ) M u l t i p l e 1 G B W E B I R N T C I R

3

W E B － J 3 s e t s N e w s S u m m N T C I R

3

S u m m 4 J ( E ) J 7 7 6 M B N e w s Q A N T C I R

3

Q A 3 C s C t K J E J ( J E ) 4 5 G B P a t e n t I R N T C I R

3

P A T E N T 4 C t K J E C t K J E c a 3 G B N e w s I R N T C I R

3

C L I R 4 + r e l a t i v e J ( E ) M u l t i p l e 1 G B W E B I R N T C I R

3

W E B － J 6 d

c

s ＋5 s e t s N e w s S u m m N T C I R

3

S u m m e x a c t J ( E ) J 2 8 2 M B N e w s Q A N T C I R

3

Q A 3 C s C t K J E J ( J E ) 1 8 G B ( + 5 G B ) P a t e n t I R N T C I R

3

P A T E N T 4 C t K J E C t K J E 8 8 4 M B N e w s I R N T C I R

3

C L I R J J 1 8 d

c

s N e w s S u m m N T C I R

2

S u m m 4 J E J E 8 M B A c a d e m i c I R N T C I R

2

4 C t E C t 1 3 2 M B N e w s I R C I R B 1 3 J J E 5 7 7 M B A c a d e m i c I R N T C I R

1

L a n g u a g e L a n g u a g e S i z e G e n r e R e l e v a n c e / A n s w e r t

p

i c . / Q D

c

u m e n t s t a s k C

l

l e c t i

n

ntcir4-ov 2004-06-02 8

Tasks of NTCI R- 4

CLI R: Chinese, Korean, J apanese, English

Single Language I R; Bilingual CLI R; Mult ilingual CLI R;

Pivot CLI R

Pat ent Ret rieval

Main : I nvalidit y Search (Search pat ent by pat ent )
Feasibilit y: Aut omat ic Pat ent Map Gener at ion

Quest ion Answering

5 possible answers; 1 set of all t he answers; series of Qs

Text Summarizat ion

Mult iple document summarizat ion --Aut omat ic evaluat ion!

Web

I nf ormat ional; Navigat ional; Geographic; Clust ering

ntcir4-ov 2004-06-02 9

NTCI R Workshop 4 (2003- 2004) Organizers

+CLI R Kuang-hua Chen, NTU Sukhoon Lee, NCU Kazuaki Kishida, Surugadai U Hsin-Hsi Chen, NTU Sung Hyon Myaeng, I I U Kazuko Kur iyama, Shir ayur i U Noriko Kando, NI I +PATENT At sushi Fuj ii, Tsukuba U Makot o I wayama, Hit achi Noriko Kando, NI I +Question Answering J unichi Fukumot o, Rit sumeikan U Tsuneaki Kat o, U Tokyo Fumit o Masui, Mie U +Text Summarization Tsut omu Hirao, NTT-CS Takahir o Fukusima, Ot emongakuin U Hidet sugu Nanba, Hiroshima C U Manabu Okumur a, TI TEC +WEB Koj i Eguchi, NI I Keizo Oyama, NI I General chair: J un Adachi, NI I Program chair: Nor iko Kando, NI I

ntcir4-ov 2004-06-02 10

NTCI R workshop: Number of Part icipat ing Groups

2 4 6 8 1st workshop 2st workshop 3rd workshop 4t h workshop # of groups # of count ries

74 groups f rom 10 countries

74 65 36 28 10 9 8 6

ntcir4-ov 2004-06-02 11

20 40 60 80 100 120

1st (1998-9) 2nd(2000-1) 3r d (2001-2) 4t h(2003-4)

# of ParticipatingGroups QA Summarizat ion Term Ext ract ion Web Ret rieval Pat ent Ret rieval NonJ apanese I R CLI R J apanese I R 20 40 60 80 100 120

1st (1998-9) 2nd(2000-1) 3r d (2001-2) 4t h(2003-4)

# of ParticipatingGroups QA Summarizat ion Term Ext ract ion Web Ret rieval Pat ent Ret rieval NonJ apanese I R CLI R J apanese I R

Chinese chinese . Korean JE

JE, EJ、 EC xCJEK

Number of Participants by Tasks

ntcir4-ov 2004-06-02 12

[CLIR] Chinese Academy of Sciences (China PRC) Clairvoyance Corporation and Justsystem (USA) Communications Research Laboratory-1 (Japan) Fu Jen Catholic University (Taiwan ROC) Hong Kong Polytechnic University (Hong Kong, China PRC) Hummingbird (Canada) Institute of Inforcomm Research (Singapore) Korea University (Korea) Nara Institute of Science and Technology-1(Japan) National Institute of Informatics-1 (Japan) National Taiwan University (Taiwan ROC) Oki Electric-1 (Japan) PATOLIS (Japan) Pohang University of Science and Technology (Korea) Queens College City Univiersity of New York (USA) Ricoh-1 (Japan) Royal Melbourn Intitute of Technology (Australia) Thomson Legal and Regulatory (USA) Tianjin University (China PRC) Toshiba (Japan) University of Arizona (USA) University of California Berkeley (USA) University of Chicago (USA) University of Neuchatel (Switzerland) University of Tsukuba (Japan) Yokohama National University (Japan) [PATENT] Fujitsu Laboratories (Japan) IBM Research (Japan) Japan Patent Information Organization / Hitachi (Japan) Nagaoka University of Technology (Japan) NTT DATA (Japan) Osaka Kyoiku University (Japan) PATOLIS (Japan) Ricoh-2 (Japan) Tokyo Institute of Technology (Japan) University of Tsukuba (Japan) [QAC] AIST/University of Nagoya/Univeristy of Tsukuba (Japan) Communications Research Laboratory-1 (Japan) Iwate Prefectural University (Japan) Keio University (Japan) Matsushita Electoric Industiral-1 (Japan) Mie University (Japan) Nagaoka University of Technology (Japan) Nara Institute of Science and Technology-2 (Japan) New York University (USA)/Communication Research Lobaratory-2 (Japan) NTT Communication Science Laboratories-1 (Japan) NTT DATA (Japan) Oki Electric-2(Japan) Pohang University of Science and Technology (Korea) Ritsumeikan University (Japan) Toshiba (Japan) Toyohashi University of Technology-1 (Japan) University of Tokyo-1 (Japan) Yokohama National University (Japan) [TSC] Communications Research Laboratory-2 (Japan) / New York University (USA) Graduate University for Advanced Studies (Japan) Hokkaido University (Japan) Pohang University of Science and Technology (Korea) Ritsumeikan University (Japan) Toyohashi University of Technology-1 (Japan) University of Electro-Communications (Japan) University of Tokyo-1 (Japan) Yokohama National University (Japan) [WEB] Hokkaido University (Japan) Ibaraki University (Japan) Matsushita Electoric Industiral-2 (Japan) NEC (Japan) NII-2/Univ. of Tokyo-2/KYA Group (Japan) NTT Communication Science Laboratories-2 (Japan) Osaka Kyoiku University (Japan) Tokyo Metropolitan University (Japan) Toyohashi University of Technology-1 (Japan) Toyohashi University of Technology-2 (Japan) University of Tsukuba/University of Nagoya 74 groups from 10 countries & areas

Number of Participants

SLIDE 3

3

ntcir4-ov 2004-06-02 14

Schedule f or NTCI R- 4

April 2003: Document Release J une-Sept , 2003: Dry Run Oct -Dec, 2003: Formal Run Feb 20, 2004: Evaluat ion Result s Ret urn Lat e March 2004: Paper Submission Open Submission Session ACM-TALI P Special I ssue Recommendat ion 2-5 J une 2004: Conf erence, at NI I , Tokyo J apan 15 J uly 2004: ACM-TALI P Due 31 J uly 2004: Revised papers f or Proceedings

ntcir4-ov 2004-06-02 15

What’s New to NTCI R

Open Submission Session
ACM-TALI P Special I ssue Recommendat ion
Open At t endance
Research Purpose Use of t he Submission Raw Dat a

St art ed wit h NTCI R-3 CLI R, and t hen will enlarge

Online Working Not es and Slides (f rom t oday)

…pls send a PDF of Slides if you agree

ntcir4-ov 2004-06-02 16

Meet ing Local Arrangement

Local Arrangement : Keiko Wat anabe Digit al Post er: Tsuneaki Kat o Boast er: Fumit o Masui Pre-Meet ing Lect ure: Koj i Eguchi Open Submission: Keit a Tsuj i Publicat ion/ Web: Haruko I shikawa Oral Present at ion: Takashi Koga Secret ariat : Yuko Hayashi Ai Kuba Shigeko Tokuda Tomoko Sonobe

Cross-Language I nf ormat ion Ret rieval (CLI R) Task

Task Organizers Kazuaki Kishida*, Kuang-hua Chen, Sukhoon Lee, Hsin-Hsi Chen, Koj i Eguchi, Noriko Kando Kazuko Kuriyama, Sung Hyon Myaeng

I n Cooperat ion wit h: Nat ional Taiwan Univ, Korea I nst it ut e of Science and Technology I nf ormat ion,

ntcir4-ov 2004-06-02 18

Design of CLIR Task

Purpose

– To promote researches of cross-lingual information retrieval (CLIR) on East-Asian languages

Languages

– Chinese (C), Japanese (J), Korean (K), English (E)

Subtasks

– Multilingual CLIR (MLIR) : e.g., C - CJKE – Bilingual CLIR (BLIR): e.g., C - J – Single Language IR (SLIR): e.g., C - C – Pivot Bilingual CLIR (PLIR): e.g., C - E - J

ntcir4-ov 2004-06-02 19

Test Collect ion

Document set s – News art icles (1998-99)

– Chinese: 381,681 docs – J apanese: 596,058 docs – Korean: 254,438 docs – English: 347,550 docs

Queries – 60 t opics

– TI TLE-only run, DESC-only r un, ot hers

Relevance J udgment s – 4 grades

– Highly Relevant (S), Relevant (A), Part ial Relevant (B), Non-Relevant (C)

SLIDE 4

4

ntcir4-ov 2004-06-02 20

NTCI R- 4 CLI R (2003- 2004)

Documents 60 topics

Published in 1998- 1999

Korean Korean Japanese Japanese English English Chineset r ad Chineset r ad

Short Q: D-only and T-only are mandat ory
Background inf o of search request s
Balance bt w topic- types:
specif ic (ex. Part icular event ) vs generic
proper nouns vs wit hout PN
domest ic/ regional/ int ernat ional

J J E E E J E J J C C C C C C K K K K K E

1. 6 M Docs
3. 3 GB

ntcir4-ov 2004-06-02 21

Documents f or CLI R at NTCI R

Chineset r ad Japanese English Korean

380K doc 310K doc 590K doc 350K doc

Published in 1998- 1999

NTCI R- 3 (2002) NTCI R- 4 (2003- 2004)

Good balance btw 4 langs Every language is multi- source Good balance btw 4 langs Every language is multi- source Chineset r ad Japanese

Korean

23K doc 220K doc 66K doc 250K doc

English

Publiched in 1998- 1999 Published in 1994

870MB

3. 3GB

ntcir4-ov 2004-06-02 22

Submission of results

26 groups submitted results

– From Australia, Canada, China, Hong Kong, Japan, Korea, Singapore, Switzerland, Taiwan, USA (10 countries and areas)

No. of runs

– SLIR: 182 runs from 19 groups – BLIR (or PLIR): 149 runs from 17 groups – MLIR: 37 runs from 5 groups – TOTAL: 368 runs

ntcir4-ov 2004-06-02 23

Techniques for CLIR

Indexing, Stop Words, Decompounding
Query and Document translation

– MT, MRD, Parallel corpora

Translation disambiguation
Out-of-vocabulary problem

– Use of Web resources – Transliteration - Cognate

Query expansion techniques

– Pseudo-relevance feedback – Use of Knowledge ontology

Merging strategies

ntcir4-ov 2004-06-02 24

Evaluation (2)

SLIR: C-C-D (Rigid) – top 8 groups

C

C
D

( R i g i d ) . . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 1 . . . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 1 . R e c a l l P r e c i s i

n

I 2 R

C
C
D
1

O K I

C
C
D
4

p i r c s

C
C
D
2

R C U N A

C
C
D
1

U n i N E

C
C
D
3

K L E

C
C
D
1

I F L A B

C
C
D
1

J S C C C

C
C
D
3

ntcir4-ov 2004-06-02 25

Evaluation (3)

SLIR: J-J-D (Rigid) – top 8 groups

J

J
D

( R i g i d ) . . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 1 . . . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 1 . R e c a l l P r e c i s i

n

P L L S

J
J
D
4

J S C C C

J
J
D
1

R C U N A

J
J
D
1

T S B

J
J
D
1

C R L

J
J
D
2

U n i N E

J
J
D
2

K L E

J
J
D
1

B R K L Y

J
J
D
2

SLIDE 5

5

ntcir4-ov 2004-06-02 26

Evaluation (4)

SLIR: K-K-D (Rigid) – top 8 groups

K

K
D

( R i g i d ) . . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 1 . . . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 1 . R e c a l l P r e c i s i

n

C R L

K
K
D
2

K L E

K
K
D
1

U n i N E

K
K
D
5

p i r c s

K
K
D
2

H U M

K
K
D
5

I F L A B

K
K
D
1

F J U I R

K
K
D
2

t l r r d

K
K
D
2

ntcir4-ov 2004-06-02 27

Evaluation (5)

SLIR: E-E-D (Rigid) – top 8 groups

E

E
D

( R i g i d ) . . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 1 . . . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 1 . R e c a l l P r e c i s i

n

T S B

E
E
D
1

J S C C C

E
E
D
4

O K I

E
E
D
4

U n i N E

E
E
D
4

p i r c s

E
E
D
2

C R L

E
E
D
2

H U M

E
E
D
5

I F L A B

E
E
D
1

ntcir4-ov 2004-06-02 28

Evaluation (6)

C

J
D

( R i g i d ) . . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 1 . . . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 1 . R e c a l l P r e c i s i

n

T S B

C
J
D
3

B R K L Y

C
J
D
3

N I I

C
J
D
2

O K I

C
J
D
4

t l r r d

C
J
D
2
SLI R: C-J -D (Rigid) – t op gr oups

ntcir4-ov 2004-06-02 29

Evaluation (6)

BLIR – Comparison of MAP values between

best SLIR and best BLIR runs (D-run, Rigid) 64.9% .2250 K-E 69.3% .3249 E-K 96.2% .3340 J-E 85.0% .3984 J-K 64.5% .2238 C-E 84.8% .3973 C-K .3469 E-E .4685 K-K 70.3% .2674 E-J 20.4% .0663 E-C 77.2% .2935 K-J 44.5% .1447 K-C 60.7% .2309 C-J 16.8% .0548 J-C .3804 J-J .3255 C-C

ntcir4-ov 2004-06-02 30

Conclusion

Various techniques for improving search

performance were used.

BLIR on Chinese document sets shows

relatively low performance. Meanwhile, BLIR on Korean documents seems to reach at a very high level.

Performance of pivot language approach

is lower than non-pivot runs.

Performance of MLIR was low.

Pat ent Ret rieval Task

Task Organizers At sushi Fuj ii (Univ of Tsukuba) Makot o I wayama (TI T/ Hit achi) Noriko Kando (NI I )

I n Cooperat ion wit h: J apan I nt ellect ual Propert y Associat ion (J I PA)

SLIDE 6

6

ntcir4-ov 2004-06-02 32

NTCI R-4 Pat ent (2003-2004)

Translat ion

(1993-2002) Full text with author’s abstract (in Japanese) (1993-2002) Abstract (in English) Ca.3.5 M docs 3.5 million docs.

DOCUMENTS

Japanese English Chinesetrad

Chinesesymp

Korean

TOPI CS

(40 manual +

100 aut omat ic) 1993-97 are used f or evaluat ion

Patents (claims)

By prof essional abst ract ors

Ca. 45GB

Main: Search patents by patent

t ext ret rieval + relevant

passage pinpoint ing

Feasibility: patent map automatic creation

make a t able f rom a set of

relevant pat ent s on a t opic (more t han 100 pat ent s), t o see t he t ech t rends. t ext mining), 3 year t ask

Main: Search patents by patent

t ext ret rieval + relevant

passage pinpoint ing

Feasibility: patent map automatic creation

make a t able f rom a set of

relevant pat ent s on a t opic (more t han 100 pat ent s), t o see t he t ech t rends. t ext mining), 3 year t ask

Formal Run Submission:

Nov. 21, 2003

ntcir4-ov 2004-06-02 33

Process of producing test collection

system1 system2 pooling relevance judgment evaluation assessors (human experts)

search target (doc. collection)

search results2 search results1 pooled results search topic Test Collection runs relevant docs. relevant docs. preliminary search

ntcir4-ov 2004-06-02 34

Search topics

Japanese patent application rejected by

Japanese Patent Office (JPO)

34 topics were selected and judged by

members of “Japan Intellectual Property Association” (JIPA)

69 additional topics: applications rejected by

JPO/ used the citations only

Quite few relevant documents
English, Korean, and simplified/traditional

Chinese translations were also produced for cross-language patent IR

ntcir4-ov 2004-06-02 35

Example search topic

<TOPIC> <NUM>008</NUM> <LANG>EN</LANG> <FDATE>19960527</FDATE> <CLAIM>(Claim 1) A sensor device, characterized in that an open recessed part is formed on a box-shaped forming base, a conductive film of a designated pattern is formed on the surface

f the forming base including the inner surface of the recessed

part, an element for a sensor is bonded to the recessed part, and the forming base is closed with a cover.</CLAIM> ... </TOPIC> Relevant documents must be prior art, which had been open to the public before the topic patent was filed

Target for invalidation

Date of filing

ntcir4-ov 2004-06-02 36

Relevance judgment

Document-based relevant judgment

– A: patent that can invalidate the topic claim – B: patent that can invalidate the topic claim, when used with other patents

passage-based relevant judgment:

– combinational relevance

Submitted runs were evaluated by mean

average precision (MAP)

ntcir4-ov 2004-06-02 37

Scenario of patent map generation

search topic classification documents retrieval visualization topics and documents in NTCIR-3 collection

application JAPIO abst PAJ

multi-dimensional matrix

SLIDE 7

7

ntcir4-ov 2004-06-02 38

1998-012923 1998-247745 1998-256597 1998-135514 1998-256668 1998-135516 1998-242586 1998-247761

structure of light emitting element

1998-242515 1998-270757 1998-173230 1998-209499 1998-256602 1998-242518 1998-215034 1998-223930

electrode arrangement

1998-209495 1998-190063 1998-209498 1998-107318

electrode composition

1998-145000 1998-233554

structure of active layer emission intensity emission stability long

perating

life reliability crystalline

problems to be solved solutions

Example map (blue light-emitting diode)

given participants identify lines and columns

ntcir4-ov 2004-06-02 39

Details of relevant documents (A)

cit at ion J I PA =I SJ * Syst em=Pooling

19 17 25 58 40

t ot al number of document s is 159

*I SJ =I nt er act ive Search and J udgment

ntcir4-ov 2004-06-02 40

Quest ion Answering Challenge (QAC-1)

Task Organizers Jun' ichi FUKUMOTO Tsuneaki KATO Fumito MASUI

ntcir4-ov 2004-06-02 42

Quest ion Answering Challenge at NTCI R

Subtask 1: 5 ordered answers: Eval by MRR 195Q Subtask 2: 1 set of all the answers: 199Q Ret urn 1 set of only and all t he correct answers. Q may have mult iple answers or no answers. Penalt y given f or wrong answers. Eval by F-measure Subtask 3: A series of questions. 251 Q (36 series) Report writ ing t ask: t opic cent ered vs browsing, Eval by F-measure

Exact Answers - Ret urn in 48 hours
Doc I Ds are required as support inf ormat ion

NTCI R-3 (2002): Using one news source NTCI R-4 (2003-2004): Two dif f erent news sources

ntcir4-ov 2004-06-02 43

Submission

Subt ask-1: 25 runs f rom 17 groups Subt ask-2: 14 runs f rom 9 groups Subt ask-3: 14 runs f rom 7 groups At t ribut es of part icipat ing groups Univ 11 (9 J apan, 2 I nt er nat ional) Company 5 Nat ional Labs 2

SLIDE 8

8

ntcir4-ov 2004-06-02 44

Results: Subtask 1

MRR of correct ratio of 1st ranked answer and among 5th ranked ones

N C Q A L C R L

2

C R L

1

T K B Q

2

F

r

e s t

S

T K B Q

1

F

r

e s t

F

T S B

A

T B

B

G D Q A N S A T D R D N D C

1

N Y U C R L R D N D C

2

s m l a b N A I S T R i t s Q A

E

n a k O K I

1

O K I

2

M A I Q A

1

R i t s Q A

N

K L E M A I Q A

2

N U T . . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8

M R R R a t i

.

1 s t R a t i

.

5

ntcir4-ov 2004-06-02 45 C R L

1

T K B Q

1

T K B Q

2

C R L

2

R D N D C

1

N A I S T R D N D C

2

s m l a b R i t s Q A

E

M A I Q A O K I

1

O K I

2

R i t s Q A

N

i w a t e . . 1 . 2 . 3 . 4 . 5

M F M P M R

Results: Subtask 2

MRR of correct ratio of 1st ranked answer and among 5th ranked ones

ntcir4-ov 2004-06-02 46

Subt ask 3 : series of Qs Background

To make open-domain QA systems to answer

series of related questions rather than isolated questions

– To be used for gathering/browsing information interactively – A useful aid to report writing and summarization

To evaluate QA systems’ abilities for

participating in information access dialogues

– Context processing such as anaphora resolution and ellipses handling – Objective and quantitative

ntcir4-ov 2004-06-02 47

Subt ask 3 : series of Qs

Sit uat ion Set t ings (User’s Task)

1. (Topic- orient ed) I nf ormation gathering f or writing

a report on a specif ic topic – One (hidden) global t opic and series of Qs on subt opics of t he global t opic

2. Browsing along transitive interests

– Topic or f ocus of t he Qs are shif t ing t hrough t he int eract ion of t he user and syst em. – Local coherence wit h t he previous Q only Answering a series of Qs has close relation with Multi- Doc Summarizat ion:

– Series of Qs covers subt opics shall be cont ained in a summary; can be used as “qualit y quest ions”, – Summarizat ion as pr e-processing of QA? – QA f or pre-processing of Abst ract -t ype summary gener at ion?

ntcir4-ov 2004-06-02 48

Example of Series of Questions

When was Seiji Ozawa born?
Where was he born?
Which university did he graduate from?
Who did he study under?
Who recognized him?
Which orchestra was he conducting in 1998?
Which orchestra will he begin to conduct in

2002? Series 14: Strictly Gathering Type

ntcir4-ov 2004-06-02 49

Example of Series of Questions

Which stadium is home to the New York

Yankees?

When was it built?
How many persons' monuments have been

displayed there?

Whose monument was displayed in 1999?
When did he come to Japan on honeymoon?
Who was the bride at that time?
Who often draws pop art using her as a motif?
What company's can did he often draw also?

Series 22: Browsing Type

SLIDE 9

9

ntcir4-ov 2004-06-02 50

Problem on Evaluation

Questions:Which stadium is home to the NYY?

When was it built?

Fact:

The Yankee stadium was built in 1923 The Shea stadium was built in 1964

System A’s Answer

– Which stadium is home to the NYY? The Shea stadium × – When was it built? 1964 ×

System B’s Answer

– Which stadium is home to the NYY? The Shea stadium × – When was it built? 1923 ○

ntcir4-ov 2004-06-02 51

Pragmatic Phenomena Observed

251 Total # of Questions 7 Ellipses 11 Definite Noun Phrases 134 Zero Pronouns 76 Pronouns 36 No Reference Expression Occurrence Phenomena

ntcir4-ov 2004-06-02 52

Evaluation by MMF

. . 1 . 2 . 3 . 4 . 5 . 6

C R L 2 C R L 1 T K B Q 1 T K B Q 2 T K B Q 3 C R L 3 R D N D C 2 R I T S E R D N D C 1 S M L A B M A I Q A 2 M A I Q A 1 R I T S N O K I

T

t

a l F i r s t R e s t ntcir4-ov 2004-06-02 53

Differences on Series Type

. . 1 . 1 . 2 . 2 . 3 . 3 . 4

C R L 2 C R L 1 T K B Q 1 T K B Q 2 T K B Q 3 C R L 3 R D N D C 2 R I T S E R D N D C 1 S M L A B M A I Q A 2 M A I Q A 1 R I T S N O K I

S

G

a t h e r i n g G a t h e r i n g B r

w

s i n g

Text Summarizat ion Challenge

Takahiro FUKUSI MA *Tsut omu HI RAO Hidet sugu NANBA Manabu OKUMURA

ntcir4-ov 2004-06-02 55

Tasks

Extraction

– Extracting important sentences from document sets limitation: No of sentences

Abstraction

– Producing summaries from document sets limitation: No of characters

Two kinds of summarization -

The length: short, long The limitations were given by the organizers

SLIDE 10

10

ntcir4-ov 2004-06-02 56

Data

30 document clusters

– 30 event document clusters selected by organizers from both Mainichi and Yomiuri Newspaper articles

redundant source

– almost 10 documents / topic

Data given for participants

– 30 document clusters – Titles of the clusters – (Questions about important parts concerning document clusters generated by human summarizers)* * The participants judges whether use this information or not

ntcir4-ov 2004-06-02 57

Participants

9 participants from university and

governmental research org. –10 systems for extraction –9 systems for abstraction –University: 6, Gov.+Univ.:3

ntcir4-ov 2004-06-02 58

Evaluation Methods

Extraction

– Intrinsic evaluation

Precision, Coverage
Abstraction

– Intrinsic evaluation

Content : Information Coverage
Readability : Quality Questions (We modified

DUC’s QQ for Japanese text)

– Extrinsic evaluation

Pseudo-Question-Answering

ntcir4-ov 2004-06-02 59

Comparison among systems (short)

Human Lead

ntcir4-ov 2004-06-02 60

Comparison among topics (short)

ntcir4-ov 2004-06-02 61

Evaluation Results (Readability)

Q04: How many expressions which have

same meanings but different term are there?

– System average: 2.05 (short), 4.16 (long) – Human : 0.433, 1.133

Q08: Does the summary have wrong

chronological ordering (yes -1, no+1,

ther0)

– System average: -0.21, -0.58 – Human : 0.933 0.800

SLIDE 11

11

ntcir4-ov 2004-06-02 62

Web Retrieval Task

Task Organizers

Koji Eguchi (NI I ), Co- Chair Keizo Oyama (NI I ), Co- Chair Akiko Aizawa (NI I ), Haruko I shikawa (NI I ), Masatoshi Arikawa (Tokyo U), Tsuyoshi Sagara (Tokyo U), Hayato Yamana (WasedaU)

ntcir4-ov 2004-06-02 63

NTCIR-4 WEB

WEB Task in NTCIR WEB Task in NTCIR-

4 at a Glance

4 at a Glance

(Subtask A) Informational Retrieval Task 2 (Subtask B) Navigational Retrieval Task 1 [Pilot Task](Subtask C) Geographical Task 1 [Pilot Task](Subtask D) Topical Classification Task 1 —search result classification, eg.using clustering Data set: – ‘NW100G-01’ (100GB Web page data crawled in 2001 from “*.jp”) for Subtasks A and B – ‘Target data’ (a comparatively small size of subsets of the NW100G-01) for Subtasks C and D.

ntcir4-ov 2004-06-02 64

NTCIR-4 WEB

Informational Retrieval Task2 Informational Retrieval Task2

Goal: To assess effectiveness of “Subject Search”.
Topics:

– Delivered 300, Mandatory runs: TITLE, ALT{0-3} (alternate queries) or DESC -only – Assessed 80 topics for ‘Target’,and 35 of them for ‘Survey’ – Remaining 48 are now under assessment,

Active participants: 5 (+2 invited groups +2 organizers’ systems)
Submitted runs: 135

– Almost used content info only, some used anchor text, several used link info.

Pooling: Top 100 docs for ‘Survey and top 20 docs for ‘Target’- (their

possible out-linked pages were also assessed.)

Rel. judgment: highly, fairly & partially relevant, irrelevant and best 3

relevant. “Content duplication” was judged using automatically detected candidates.

ntcir4-ov 2004-06-02 65

NTCIR-4 WEB

Informational Retrieval Task2 Informational Retrieval Task2

Evaluation measures:

– Avg.prec. and DCG for ‘Survey’. – Prec., DCG and WRR(MRR) @10 for ‘Target’

Alternative evaluation as trial:

– Investigated on evaluation measures reflecting users’ intuition as ‘user-oriented evaluation measures’. (see Ohtsuka in working-notes.)

Future work:

– Analyze the evaluation results further. – Tuning parameters of WRR (‘Weighted Reciprocal Rank’) – Verify stability of evaluation measures – Check comprehensiveness of assessment results – Analyze topic-by-topic behavior of each systems

. . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 1 . . . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 1 . r e c a l l p r e c i s i

n

D B L A B

t

t

1

G R A C E

t

t

2

S S T U T

t

t

2

R 2 D 2

t

t

1

O R G R E F

t

t

6

T K B

t

t

1

K 3 1

t

t

1

O K S A T

t

t

1

N A M A Z U

t

t

2

2 4 6 8 1 1 2 1 4 1 6 1 5 1 1 5 2 3 4 5 6 7 8 9 0 1 r a n k s D C G D B L A B

t

t

1

G R A C E

t

t

2

S S T U T

t

t

2

R 2 D 2

t

t

1

O R G R E F

t

t

6

T K B

t

t

1

K 3 1

t

t

1

O K S A T

t

t

1

N A M A Z U

t

t

2

DCG curves of TITLE-only runs at rigid relevance level. RP curves of TITLE-only runs at rigid relevance level.

ntcir4-ov 2004-06-02 66

NTCIR-4 WEB

Navigational Retrieval Task Navigational Retrieval Task１１

Goal: To assess retrieval effectiveness of “Known Item Search”.
Topics:

– Delivered 300, Mandatory run : using TITLE only. – Assessed 144 and used 87 or 72 for evaluation depending on document set definition. Remaining 156 are now under assessment.

Active participants: 5 (+organizers)
Submitted runs: 16 (+68 by organizers).

– 8 (+24) runs used anchor text, 4 (+40) runs used link info, 4 (+4) runs used content info only.

Pooling: Top 10 docs (their possible out-linked pages were also

assessed.)

Rel. judgment: relevant, partially relevant, non-relevant.

“Representativeness” was judged based on every available information, e.g., provider of the page, content (text, images, etc.), URL, out-linked pages.

ntcir4-ov 2004-06-02 67

NTCIR-4 WEB

Navigational Retrieval Task Navigational Retrieval Task１１

Evaluation measures: DCG and MRR at top-10 doc.

level

Evaluation result:

– Tendency on MRR & DCG

Several anchor-base systems performed best.
Several link-base systems performed fairly.
Content-base systems performed poorly.

– Tendency on ranks of found rel. docs

Anchor-base systems tend to be able to pinpoint rel. docs.
Link-base and content-base systems tend to return noisy

results.

Future work:

– Verify stability of evaluation measures – Check comprehensiveness of assessment results – Study on evaluation measures reflecting users’ intuition – Analyze topic-by-topic behavior of each systems

WEB Navigational Retrieval Task at NTCIR-5:

– Use larger Web page data set — 300GB~1TB? – Crawl each site more deeply — >10k pages/site? – Reinforce “Open Laboratory”

1 2 3 4 5 6 7 8 9 2 4 6 8 1 I d e a l a n c h

r

/ l i n k i n f

l

i n k i n f

c
n

t e n t

n

l y

0.1 0.2 0.3 0.4 0.5 0.6 0.7 anchor/link info link info content only

SLIDE 12

12

ntcir4-ov 2004-06-02 68

Challenges in I nf ormation Access

Scaling-up Beyond t he Het erogenet y Beyond “Document ” Ret rieval

Language, media, document genres, et c. Appreciat e each dif f erence

“Needs” Behind t he Queries

* Evaluat ion met hodology and met rics must ref lect t he social needs f or t he t echnologies.* Answer/ inf o in document s User’s sit uat ion, t ask, problem Beyond “t opic” and ”f act ”

ntcir4-ov 2004-06-02 69

Contact I nf o & Online Proceedings Documents used are Asian Languages but participation f rom all over the world is more than welcome!! Links t o f reely available resources are

available. Many of t hem have English manual

and have been used by non- Asian active participants. I nquiries: Noriko Kando at kando@nii.ac.j p, or NTCI R-Admin Group at nt cdam@nii.ac.j p Online Proceedings & ot her inf o: http:/ / research. nii. ac. jp. / ntcir/

ntcir4-ov 2004-06-02 70

Thanks Merci Danke schön Gracie Gracias Tack Köszönöm Kiit os Terima Kasih Khap Khun Ahsant e Tak 謝謝ありがとう Thanks Merci Danke schön Gracie Gracias Tack Köszönöm Kiit os Terima Kasih Khap Khun Ahsant e Tak 謝謝ありがとう