Overview of the 7th NTCIR f Workshop
N k K d Noriko Kando
National Institute of Informatics, Japan h // h ii j / i / http://research.nii.ac.jp/ntcir/ kando (at) nii. ac. Jp
With th k f T t S k i f th lid
NTC7 OV 2008-12-17 Noriko Kando 1
Overview of the 7 th NTCIR f Workshop N Noriko Kando k K d - - PowerPoint PPT Presentation
Overview of the 7 th NTCIR f Workshop N Noriko Kando k K d National Institute of Informatics, Japan http://research.nii.ac.jp/ntcir/ h // h ii j / i / kando (at) nii. ac. Jp With th With thanks for Tetsuya Sakai for the slides k f
NTC7 OV 2008-12-17 Noriko Kando 1
NII Test Collection for Information Retrieval
Research Infrastructure for Evaluating IA Research Infrastructure for Evaluating IA
■Data sets, evaluation methodologies, and forum
Once every 18 months
■Data sets, evaluation methodologies, and forum
5th 6th 7th
Scientific, news, patents, and web Chin s K r n J p n s nd En lish
1st 2st 3rd 4th
Chinese, Korean, Japanese, and English
IR: Cross-lingual tasks, patents, web,
20 40 60 80 100st
# of groups # of countries
QA:Monolingual tasks, cross-lingual tasks Summarization, trend info., patent maps Opinion analysis, text mining
NTC7 OV 2008-12-17 Noriko Kando 2
NTCIR-7 participants
82 groups from 15 countries
NTC7 OV 2008-12-17 Noriko Kando 3
NTC7 OV 2008-12-17 4 Noriko Kando
T Cross-lingual IR Japanese IR
news sci
T a s k W b R i l Patent Retrieval map/classif Cross lingual IR k s Web Retrieval Navigational Geo Result Classification QuestionAnswering Info Access Dialog S t i s Term Extraction Text Summarization Summ metrics Cross-Lingual Trend Information Opinion Analysis
NTC7 OV 2008-12-17 Noriko Kando 5
li h J
NTC7 OV 2008-12-17 Noriko Kando 6
– Each participating research group conducts experiments h h d h with various approaches and can participate with own purpose.
NTC7 OV 2008-12-17 Noriko Kando 8
NTC7 OV 2008-12-17 Noriko Kando 9
J-J Level1 D auto 1.0000
検索システム別の11pt再現率精度
101
0.8000
A B C
Average over 50 topics
1
101 102 103 104 105 106
0.6000 cision
C D E F G
50 topics
0.8
107 108 109 110 111
0.4000 pre
G H I J K
0.4 0.6 precisi
112 113 114 115 116 117
0.2000
L M N O
0.2
118 119 120 121 122
0.0000 . . 2 . 4 . 6 . 8 1 . recall
P
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall
123 124 125 126 127
NTC7 OV 2008-12-17 Noriko Kando 10
recall
J J L l1 D t J-J Level1 D auto 1.0000
検索システム別の11pt再現率精度
101
0.8000
A B C
1
102 103 104 105 106
Average over 50 topics
J-J Level1 D auto
0.6000 ecision
D E F G
0.6 0.8 ision
107 108 109 110 111 112
50 topics
0 8000 1.0000
A B C
n
0.4000 pre
H I J K
0.4 0.6 preci
113 114 115 116 117
0.6000 0.8000 ecision
D E F G
Precision
0.2000
L M N O
0.2
118 119 120 121 122 123
0.2000 0.4000 pre
H I J K
an Ave P
0.0000 . . 2 . 4 . 6 . 8 1 . recall
P
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall
123 124 125 126 127
0.0000 Topic#
L M N O
Requests #101 150 Mea
NTC7 OV 2008-12-17 Noriko Kando 11
Topic#
Requests #101-150
In Vitro Animal Experiments
Clinical Test
NTC7 OV 2008-12-17 Noriko Kando 12
Test Collections
Sharing Modules
Controlled Interactive
Uncontrolled Pre operational Laboratory- type Testing Modules , Prototype testing Interactive Testing using human Subjects Pre-operational Testing
In Vitro Animal Experiments
Clinical Test
4.User Level、5.Output Levle
2.Input Level、
6.Social Level
NTC7 OV 2008-12-17 Noriko Kando 13
NTC7 OV 2008-12-17 Noriko Kando 14
NTC7 OV 2008-12-17 Noriko Kando 15
QA Effectiveness T t ff ti f
OOV, PRF, QE in QA
NTC7 OV 2008-12-17 Noriko Kando 16
QA teams using IR4QA runs from other teams
NTC7 OV 2008-12-17 Noriko Kando 21
NTC7 OV 2008-12-17 Noriko Kando 22
Persistence Parameter β
Parameter β set to 1
Monolingual Crosslingual
CSWHU-C U-CS-C
S-01-T: <KEYTERMS> <KEYTERMS> <KEYTERM <KEYTERM SCORE="1.0 SCORE="1.0"> ">宇宙大爆炸</KEYTERM> </KEYTERM> <KEYTERM <KEYTERM SCORE="0.3 SCORE="0.3"> ">理论</KEYTERM> </KEYTERM>
</KEYTERMS> </KEYTERMS> Apath-CS-CS-01-T Apath-CS-CS-01-T: <KEYTERMS> <KEYTERMS> <KEYTERM <KEYTERM SCORE="1.0 SCORE="1.0"> ">宇宙大爆炸理论</KEYTERM> </KEYTERM> /KEYTERMS /KEYTERMS
</KEYTERMS /KEYTERMS> CMUJA CMUJAV-CS-CS-01-T
<KEYTERMS> <KEYTERMS> <KEYTERM <KEYTERM SCORE="1.0 SCORE="1.0"> ">宇宙</KEYTERM> </KEYTERM> KEYTERM KEYTERM SCORE SCORE 大 /KEYTERM /KEYTERM
<KEYTERM KEYTERM SCORE SCORE="1.0 ="1.0">大</KEYTERM /KEYTERM> <KEYTERM <KEYTERM SCORE="1.0 SCORE="1.0"> ">爆炸</KEYTERM> </KEYTERM> <KEYTERM <KEYTERM SCORE="1.0 SCORE="1.0"> ">理论</KEYTERM> </KEYTERM> <KEYTERM <KEYTERM SCORE="1.0 SCORE="1.0"> ">宇宙 大 爆炸 理论</KEYTERM> </KEYTERM> KEYTERM KEYTERM SCORE SCORE " 0" 宇宙大爆炸理论 /KEYTERM /KEYTERM
<KEYTERM KEYTERM SCORE SCORE="1. 1.0">宇宙大爆炸理论</KEYTERM /KEYTERM> <KEYTERM <KEYTERM SCORE="1.0 SCORE="1.0"> ">宇宙 大 爆炸</KEYTERM> </KEYTERM> <KEYTERM <KEYTERM SCORE="1.0 SCORE="1.0"> ">宇宙大爆炸</KEYTERM> </KEYTERM> </KEYTERMS> </KEYTERMS>
CS
CT JA By definition, JA y nDCG is more forgiving for low-recall runs w un than AP and Q.
NTC7 OV 2008-12-17 Noriko Kando 34
Genre Subjectivity Holder Polarity Strength News NTCIR-6 NTCIR-6 NTCIR-6 News NTCIR-6 NTCIR-6 NTCIR-6 Review NTCIR-7 NTCIR-7 NTCIR-7 NTCIR-7 Blog NTCIR-8 NTCIR-8 NTCIR-8 NTCIR-8 Stakeholder Tem poral Language Granuality Application Chinese single-sentSummarization Chinese single sentSummarization NTCIR-7 English clause QA NTCIR-8 NTCIR-8 Japanese multi-sent Opinion tracking CJE document Consistency checkin CJE document Consistency checkin Trend
NTC7 OV 2008-12-17 Noriko Kando 35
NTC7 OV 2008-12-17 Noriko Kando 36
Beijing university of posts and National Taiwan University NEC NEU Natural Language
Beijing university of posts and telecomunications Chinese Academy of Sciences(NLPR-IACAS) g g Processing Lab Peking University Peking University(ICL) n (NL ) City University of Hong Kong CUHK(The Chinese University of Hong Kong)-PolyU(The Hong Kong Pohang University of Science and Technology SICS - Swedish Institute of C S i g g) y ( g g Polythechnic University)- Tsinghua(Tsinghua University) DAEDALUS, S.A. Computer Science Technical University of Darmstadt Th G d t U i sit f Dalian University of Technology Hiroshima City University Information and Communications U i i The Graduate University for Advanced Studies(SOKENDAI). Tornado Technologies Co., Ltd., Taiwan University Keio University Louisiana State U i it (U i it f M l d Taiwan. Toyohashi University of Technology University of Neuchatel University(University of Maryland College Park) University of Neuchatel University of Sussex Yuan Ze Univ.
NTC7 OV 2008-12-17 Noriko Kando 37
80+ registerd, 30+ resigned when docs were changed, 42 registered to News MOAT, 24 sugmitted
NTC7 OV 2008-12-17 Noriko Kando 38
Training: 1993 2000 Test: 2001 2002 One Ref Trans good?? Training: 1993-2000, Test: 2001-2002 One Ref Trans good??
Extrinsic Eval: CLIR task-based
NTC7 OV 2008-12-17 Noriko Kando 39
NTC7 OV 2008-12-17 Noriko Kando 40
Example: Blue light-emitting diodes
Crystalline Reliability Long
life Emission stability Emission intensity Structure of active layer
1998-145000 1998-233554
Electrode composition
1998-107318 1998-190063 1998-209498 1998-209495
El t d
1998 173230
Electrode arrangement
1998-215034 1998-223930 1998-242518 1998-173230 1998-209499 1998-256602 1998-242515 1998-270757
St t f li ht
1998 135516 1998 012923
Structure of light emitting element
1998-135516 1998-242586 1998-247761 1998-135514 1998-256668 1998-012923 1998-247745 1998-256597
NTC7 OV 2008-12-17 Noriko Kando 41
– Increased the number of documents and topics (108 topics) – Increased the number of documents and topics (108 topics) – Evaluate partial matches in F-term hierarchy
NTC7 OV 2008-12-17 42 Noriko Kando
application
JAPIO abst PAJ
PAJ
NTC7 OV 2008-12-17 Noriko Kando 43
Patents with th d F t Patents with h d F
themes and F-terms (1993-1997) themes and F-terms (1998-1999)
Sampling
PMGS (F-term descriptions)
NTC7 OV 2008-12-17 44 Noriko Kando
Research paper written in Japanese (Japanese / J2E subtasks) Research paper written in English (English / E2J subtasks) ) Machine-translation )
A Par
module (E2J / J2E) Patent data itt i J
rticipant System
Text classification module
(Japanese / J2E)
(English / E2J) (English / E2J) List of IPC codes
NTC7 OV 2008-12-17 Noriko Kando 45
Nanba, Fujii, Iwayama, and Hashimoto. “The Patent Mining Task in the Seventh NTCIR Workshop”, Patent Information Retrieval Workshop at CIKM 2008 (2008)
46
Noriko Kando
NTC7 OV 2008-12-17 Noriko Kando
Fujii, Utiyama, Yamamoto, and Utsuro. “Toward the Evaluation of Machine Translation Using Patent Information”, AMTA 2008
47
problems to patent data
Addressed patent-specific IR problems
49
NTC7 OV 2008-12-17 Noriko Kando
50
NTC7 OV 2008-12-17 Noriko Kando
NTCIR-5 S rch t pic Performed by
S rch t pic
NTCIR 5 Patent claim Search topic in English
Search topic in Japanese
JPO applications 1993-2002 MT system Evaluation by BLEU
MT system Training data 1.8-M sentence pairs IR system
P t t i Translation in Japanese pairs Ranked
Evaluation by Mean Average Precision (MAP)
51
Precision (MAP)
NTC7 OV 2008-12-17 Noriko Kando
translating
JPO Aug 22 2008 USPTO Oct 24 2008 g Aug 22, 2008 Oct 24, 2008
Aug 22, 2008 claiming i it
priority
– SMT is much better for CLIR R l b d MT i d f h l ti – Rule-based MT is good for human evaluations
*SMT : a system automatically learns the translation rules from h l l
NTC7 OV 2008-12-17 Noriko Kando 55
the given large-scale sentence pairs.
NTC7 OV 2008-12-17 Noriko Kando 57
CLEF2008 2008-09-18 Noriko kando 58
Interaction of Users
CLEF2008 2008-09-18 Noriko kando 59
CLEF2008 2008-09-18 Noriko kando 61
Opinion
100 120 Groups
Opinion CLQA QA
ACLIA CCLQA
80 ParticipatingG
Trend Info Summarization
40 60 # of
Term Extraction Web Retrieval
20 40
Patent MT Patent Mining
Chinese JE
JE,EJ、 xCJEK
Chinese Korean
Patent Retrieval NonJapanese IR
JE
EC xCJEK
CLIR Japanese IR
CL R4Q ACLIA IR4QA
NTC7 OV 2008-12-17 62 Noriko Kando
[CCLQA]
B iji U i f P t & T l
Information and Communications Univ [PAT MIN]
Telecoms, China
NEC Hiroshima City Univ
Engineering
Engineering
U i f C lif i B k l [IR4QA]
p
[PAT MT] Harbin Institute of Technology + Heilongjiang Institute of Technology
Engineering n
[Must]
Heilongjiang Institute of Technology
Engineering
W h U i f S i d T h l Keio Univ
p g
Technology + NTT
[MOAT]
U i f Sh ffi ld NICT
NTC7 OV 2008-12-17 Noriko Kando 63
IACAS)
Polythechnic Univ+ Tsinghua Univ
University
NTC7 OV 2008-12-17 64 Noriko Kando
NTC7 OV 2008-12-17 Noriko Kando 65
NTC7 OV 2008-12-17 66 Noriko Kando
Mark Sanderson Doug Oard Atsushi Fujii Tatsunori Mori Mark Sanderson, Doug Oard, Atsushi Fujii, Tatsunori Mori, Fred Gey, Noriko Kando (and others)
NTC7 OV 2008-12-17 67 Noriko Kando
NTC7 OV 2008-12-17 Noriko Kando 68
NTC7 OV 2008-12-17 Noriko Kando 69