I t d ti t NTCIR 7 Introduction to NTCIR-7
N k K d Noriko Kando
National Institute of Informatics, Japan h // h ii j / i / http://research.nii.ac.jp/ntcir/ kando (at) nii. ac. Jp
NTC intro 2008-12-16 Noriko Kando 1
I t Introduction to NTCIR-7 d ti t NTCIR 7 N Noriko Kando k - - PowerPoint PPT Presentation
I t Introduction to NTCIR-7 d ti t NTCIR 7 N Noriko Kando k K d National Institute of Informatics, Japan http://research.nii.ac.jp/ntcir/ h // h ii j / i / kando (at) nii. ac. Jp Noriko Kando NTC intro 2008-12-16 1 Road map
NTC intro 2008-12-16 Noriko Kando 1
NTC intro 2008-12-16 Noriko Kando 2
NII Test Collection for Information Retrieval
Research Infrastructure for Evaluating IA Research Infrastructure for Evaluating IA
■Data sets, evaluation methodologies, and forum
Once every 18 months
■Data sets, evaluation methodologies, and forum
Scientific, news, patents, and web Chin s K r n J p n s nd En lish Chinese, Korean, Japanese, and English
IR: Cross-lingual tasks, patents, web, QA:Monolingual tasks, cross-lingual tasks Summarization, trend info., patent maps Opinion analysis, text mining
NTC intro 2008-12-16 Noriko Kando 3
NTCIR-7 participants
82 groups from 15 countries
Document set, a set
for each topic
NTC intro 2008-12-16 Noriko Kando 4
NTC intro 2008-12-16 Noriko Kando 5
NTC intro 2008-12-16 Noriko Kando 6
T Cross-lingual IR Japanese IR
news sci
T a s k W b R i l Patent Retrieval map/classif Cross lingual IR k s Web Retrieval Navigational Geo Result Classification QuestionAnswering Info Access Dialog S t i s Term Extraction Text Summarization Summ metrics Cross-Lingual Trend Information Opinion Analysis
NTC intro 2008-12-16 Noriko Kando 9
li h J
NTC intro 2008-12-16 Noriko Kando 10
Opinion
100 120 ups Opinion CLQA QA
ACLIA CCLQA
80 100 tingGrou QA Trend Info Summarization 40 60 articipat mm Term Extraction Web Retrieval 20 40 # of Pa Patent MT Patent Mining
Chinese JE
JE,EJ、 xCJEK
Chinese Korean
8
)
) 1
) 3
) 4
) 6
) 7
)
# Patent Retrieval NonJapanese IR
JE
EC xCJEK 1 s t ( 1 9 9 8
n d ( 2 3 r d ( 2 1
t h ( 2 3
t h ( 2 4
t h ( 2 6 7 t h ( 2 7
Japanese IR
CL R4Q ACLIA IR4QA
Mark Sanderson Doug Oard Atsushi Fujii Tatsunori Mori Mark Sanderson, Doug Oard, Atsushi Fujii, Tatsunori Mori, Fred Gey, Noriko Kando (and others)
CLEF2008 2008-09-18 Noriko kando 14
CLEF2008 2008-09-18 Noriko kando 15
CLEF2008 2008-09-18 Noriko kando 16
CLEF2008 2008-09-18 Noriko kando 17
[CCLQA]
B iji U i f P t & T l
Information and Communications Univ [PAT MIN]
Telecoms, China
NEC Hiroshima City Univ
Engineering
Engineering
U i f C lif i B k l [IR4QA]
p
[PAT MT] Harbin Institute of Technology + Heilongjiang Institute of Technology
Engineering n
[Must]
Heilongjiang Institute of Technology
Engineering
W h U i f S i d T h l Keio Univ
p g
Technology + NTT
[MOAT]
U i f Sh ffi ld NICT
NTC intro 2008-12-16 Noriko Kando 18
IACAS)
Polythechnic Univ+ Tsinghua Univ
University
NTC intro 2008-12-16 Noriko Kando 20
NTC intro 2008-12-16 Noriko Kando 21
NTC intro 2008-12-16 Noriko Kando 22
NTC intro 2008-12-16 Noriko Kando 24
– Each participating research group conducts experiments h h d h with various approaches and can participate with own purpose.
ICCC2007 2007-10-13 Noriko kando 25
bli i
Conferences, journals, etc.
publications Reports, papers MOU permission evaluate Run submit
j
publications
MOU MOUR i i documents
permission Topics/ questions Relevance Assessment (correct answers) documents Report,
Test collections
Report, bibliographies ICCC2007 2007-10-13 Noriko kando 26
Experimental results =runs
<DOC> <DOCNO>ctg_xxx_19990110_0001</DOCNO> <LANG>EN</LANG> <HEADLINE> Asia Urged to Move Faster in Shoring Up Shaky Banks </HEADLINE> <DATE>1999-01-10</DATE> <DATE>1999-01-10</DATE> <TEXT> <P>HONG KONG, Jan 10 (AFP) - Bank for International Settlements (BIS) general manager Andrew Crockett has urged Asian economies to move faster in reforming their shaky banking sectors, reports said Sunday. Speaking ahead of Monday's meeting at the BIS office here of international central bankers including US Federal Reserve chairman Alan Greenspan, Crockett said he was encouraged by regional banking reforms but "there is still some way to go " Asian banks shake off their burden of bad debt if they were to be able to finance recovery
in the crisis-hit region, he said according to the Sunday Morning Post. Crockett added that more stable currency exchange rates and lower interest rates had paved the way for recovery. "Therefore I believe in the financial area, the crisis has in a sense been contained and that now it is possible to look forward to real economic recovery," he was quoted as saying by the Sunday Hong Kong Standard.</P> <P>"It would not surprise me, given the interest I know certain governors have, if the subject
<P>He reiterated comments by BIS officials here that the central bankers would stay tight- lipped about their meeting, the first to be held at the Hong Kong office of the Swiss-based institution since it opened last July. </P>
ICCC2007 2007-10-13 Noriko kando 27
</TEXT> </DOC>
ICCC2007 2007-10-13 Noriko kando 28
ICCC2007 2007-10-13 Noriko kando 29
ICCC2007 2007-10-13 Noriko kando 30
ICCC2007 2007-10-13 Noriko kando 31
J-J Level1 D auto 1.0000
検索システム別の11pt再現率精度 101 102 103
0.8000
A B C
Average over 50 topics
1 103 104 105 106 107 108
0.6000 cision
C D E F G
50 topics
0.8 109 110 111 112 113
0.4000 pre
G H I J K 0.4 0.6 precision 114 115 116 117 118
0.2000
L M N O 0.2 119 120 121 122 123
0.0000 . . 2 . 4 . 6 . 8 1 . recall
P 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall 124 125 126 127 128
ICCC2007 2007-10-13 Noriko kando 32
recall
129
J J L l1 D t J-J Level1 D auto 1.0000
検索システム別の11pt再現率精度 101 102 103
0.8000
A B C 1 104 105 106 107 108
Average over 50 topicsJ-J Level1 D auto
A
0.6000 ecision
D E F G 0 6 0.8
109 110 111 112 113 114
50 topicsJ J Level1 D auto
0 8000 1.0000 B C D E
n
0.4000 pre
H I J K 0.4 0.6 precisio 114 115 116 117 118 119
0.6000 0.8000 ecision E F G H
Precision
0.2000
L M N O 0.2 119 120 121 122 123 124
0.2000 0.4000 pre I J K L
an Ave P
0.0000 . . 2 . 4 . 6 . 8 1 . recall
P 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall 124 125 126 127 128 129
0.0000 1 1 1 4 1 7 1 1 1 1 3 1 1 6 1 1 9 1 2 2 1 2 5 1 2 8 1 3 1 1 3 4 1 3 7 1 4 1 4 3 1 4 6 1 4 9 Topic# L M N O
Requests #101 150 Mea
ICCC2007 2007-10-13 Noriko kando 33
129
Topic# P
Requests #101-150
In Vitro Animal Experiments
Clinical Test
ICCC2007 2007-10-13 Noriko kando 34
Test Collections
Sharing Modules , Prototype
Controlled Interactive
Uncontrolled Pre operational Laboratory- type Testing Prototype testing Interactive Testing using human Subjects Pre-operational Testing
In Vitro Animal Experiments
Clinical Test
4.User Level、5.Output Levle
2.Input Level、
6.Social Level
ICCC2007 2007-10-13 Noriko kando 35
NTC intro 2008-12-16 Noriko Kando 36
NTC intro 2008-12-16 Noriko Kando 37
ICCC2007 2007-10-13 Noriko kando 38
Translation, Mining (NTCIR-7) g ( )
ICCC2007 2007-10-13 Noriko kando 39
ICCC2007 2007-10-13 Noriko kando 40
J Collection
E Collection
J Collection
E Collection J Collection
No paired docs
ICCC2007 2007-10-13 Noriko kando 41
No paired docs
Published in
u n 2000-2001
ICCC2007 2007-10-13 Noriko kando 42
ICCC2007 2007-10-13 Noriko kando 43
ICCC2007 2007-10-13 Noriko kando 44
(1998,99) Full text with 18G bytes
Japanese E li h
Full text with author’s abstract (in Japanese) English Chinesetrad
Chinesesym
p
Korean
Translation (1995-99) (1995-99) Ab
Translation ( ) Abstract (in Japanese) Abstract (in English) 1 7 million docs 1 7 million docs
ICCC2007 2007-10-13 Noriko kando 45
1.7 million docs. 1.7 million docs. 1995-97 are usable for translation
Ca.7 M docs
More than 1000)
text retrieval + relevant (1993-2002) Full text with Japanese English More than 1000)
passage pinpointing
Full text with author’s abstract (in Japanese) English
(1993-2002) F ll t t ith
By professional abstractors Full text with author’s abstract (in Engoish)
(1993-2002)
abstractors
(1993 2002) Abstract (in English) 7 million docs
ICCC2007 2007-10-13 Noriko kando 46
7 million docs. 5 GB
crystalline reliability long
life emission stability emission intensity structure of active layer
1998-145000 1998-233554
electrode composition
1998-107318 1998-190063 1998-209498 1998-209495
electrode arrangement
1998-215034 1998-223930 1998-242518 1998-173230 1998-209499 1998-256602 1998-242515 1998-270757
structure of light emitting element
1998-135516 1998-242586 1998-247761 1998-135514 1998-256668 1998-012923 1998-247745 1998-256597
ICCC2007 2007-10-13 Noriko kando 47
PATENT: <DESCRIPTION> 0 3 0.25 0.3 n 0.15 0.2 age precision full abs claim abs+claim 0.05 0.1 avera abs claim jsh h i t s b a s e l i n e t f i d f t f . i d f l
( t f ) g ( t f ) . i d f f ) . i d f + d l B M 2 5 b a l
( l
( t f ) . Retrieval model
*abs=author abstracts, jsh=professional abstracts
ICCC2007 2007-10-13 Noriko kando 48
0.9 1 0.6 0.7 0.8 sion NCS02 (0.4852) GATE03 (0.4779) NICT01 (0.4518) 0 2 0.3 0.4 0.5 precis JSPAT01 (0.4381) NUT05 (0.4101) RDNDC14 (0.2717) 0.1 0.2 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 1 baseline (0.2821) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall
ICCC2007 2007-10-13 Noriko kando 49
0 8 0.9 1 NCS02 (0 4852)
0 5 0.6 0.7 0.8 sion NCS02 (0.4852) GATE03 (0.4779) NICT01 (0.4518) JSPAT01 (0 4381)
0 2 0.3 0.4 0.5 precis JSPAT01 (0.4381) NUT05 (0.4101) RDNDC14 (0.2717) baseline (0 2821)
0.1 0.2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 baseline (0.2821)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall
ICCC2007 2007-10-13 Noriko kando 50
ICCC2007 2007-10-13 Noriko kando 51
ICCC2007 2007-10-13 Noriko kando 52
ICCC2007 2007-10-13 Noriko kando 53
ICCC2007 2007-10-13 Noriko kando 54
ICCC2007 2007-10-13 Noriko kando 55
ICCC2007 2007-10-13 Noriko kando 56
ICCC2007 2007-10-13 Noriko kando 57
ICCC2007 2007-10-13 Noriko kando 58
ICCC2007 2007-10-13 Noriko kando 59
ICCC2007 2007-10-13 Noriko kando 60
Genre SubjectivityHolder Polarity Strength Genre SubjectivityHolder Polarity Strength News NTCIR- 6 NTCIR- 6 NTCIR- 6 Blog NTCIR- 7 NTCIR- 7 NTCIR- 7 C NTCIR 8 NTCIR 8 NTCIR 8 NTCIR 8 Cross- genre NTCIR- 8 NTCIR- 8 NTCIR- 8 NTCIR- 8 StakeholderTem poral LanguageGranuality Application l C,J,E single- sent Summarizat ion NTCIR- 7 C,J,E clause QA NTCIR- 8 NTCIR- 8 mult i- sent Opinion t racking CJE document Consist ency checking Trend
ICCC2007 2007-10-13 Noriko kando 61
ICCC2007 2007-10-13 Noriko kando 62
ICCC2007 2007-10-13 Noriko kando 63
ICCC2007 2007-10-13 Noriko kando 64
NTC intro 2008-12-16 Noriko Kando 65
QA Effectiveness T t ff ti f
OOV, PRF, QE in QA
ICCC2007 2007-10-13 Noriko kando 66
CLEF2008 2008-09-18 Noriko kando 67
CLEF2008 2008-09-18 Noriko kando 68
CLEF2008 2008-09-18 Noriko kando 69
CLEF2008 2008-09-18 Noriko kando 70
CLEF2008 2008-09-18 Noriko kando 71
Genre Subjectivity Holder Polarity Strength News NTCIR-6 NTCIR-6 NTCIR-6 News NTCIR-6 NTCIR-6 NTCIR-6 Review NTCIR-7 NTCIR-7 NTCIR-7 NTCIR-7 Blog NTCIR-8 NTCIR-8 NTCIR-8 NTCIR-8 Stakeholder Tem poral Language Granuality Application Chinese single-sentSummarization Chinese single sentSummarization NTCIR-7 English clause QA NTCIR-8 NTCIR-8 Japanese multi-sent Opinion tracking CJE document Consistency checkin CJE document Consistency checkin Trend
CLEF2008 2008-09-18 Noriko kando 72
CLEF2008 2008-09-18 Noriko kando 73
CLEF2008 2008-09-18 Noriko kando 74
Beijing university of posts and National Taiwan University NEC NEU Natural Language
Beijing university of posts and telecomunications Chinese Academy of Sciences(NLPR-IACAS) g g Processing Lab Peking University Peking University(ICL) n (NL ) City University of Hong Kong CUHK(The Chinese University of Hong Kong)-PolyU(The Hong Kong Pohang University of Science and Technology SICS - Swedish Institute of C S i g g) y ( g g Polythechnic University)- Tsinghua(Tsinghua University) DAEDALUS, S.A. Computer Science Technical University of Darmstadt Th G d t U i sit f Dalian University of Technology Hiroshima City University Information and Communications U i i The Graduate University for Advanced Studies(SOKENDAI). Tornado Technologies Co., Ltd., Taiwan University Keio University Louisiana State U i it (U i it f M l d Taiwan. Toyohashi University of Technology University of Neuchatel University(University of Maryland College Park) University of Neuchatel University of Sussex Yuan Ze Univ.
CLEF2008 2008-09-18 Noriko kando 75
80+ registerd, 30+ resigned when docs were changed, 42 registered to News MOAT, 24 sugmitted
CLEF2008 2008-09-18 Noriko kando 76
Training: 1993 2000 Test: 2001 2002 One Ref Trans good?? Training: 1993-2000, Test: 2001-2002 One Ref Trans good??
Extrinsic Eval: CLIR task-based
CLEF2008 2008-09-18 Noriko kando 77
NTC intro 2008-12-16 Noriko Kando 78
Example: Blue light-emitting diodes
Crystalline Reliability Long
life Emission stability Emission intensity Structure of active layer
1998-145000 1998-233554
Electrode composition
1998-107318 1998-190063 1998-209498 1998-209495
El t d
1998 173230
Electrode arrangement
1998-215034 1998-223930 1998-242518 1998-173230 1998-209499 1998-256602 1998-242515 1998-270757
St t f li ht
1998 135516 1998 012923
Structure of light emitting element
1998-135516 1998-242586 1998-247761 1998-135514 1998-256668 1998-012923 1998-247745 1998-256597
NTC intro 2008-12-16 Noriko Kando 79
– Increased the number of documents and topics (108 topics) – Increased the number of documents and topics (108 topics) – Evaluate partial matches in F-term hierarchy
NTC intro 2008-12-16 80 Noriko Kando
application
JAPIO abst PAJ
PAJ
NTC intro 2008-12-16 Noriko Kando 81
Patents with th d F t Patents with h d F
themes and F-terms (1993-1997) themes and F-terms (1998-1999)
Sampling
PMGS (F-term descriptions)
NTC intro 2008-12-16 82 Noriko Kando
Research paper written in Japanese (Japanese / J2E subtasks) Research paper written in English (English / E2J subtasks) ) Machine-translation )
A Par
module (E2J / J2E) Patent data itt i J
rticipant System
Text classification module
(Japanese / J2E)
(English / E2J) (English / E2J) List of IPC codes
NTC intro 2008-12-16 Noriko Kando 83
Nanba, Fujii, Iwayama, and Hashimoto. “The Patent Mining Task in the Seventh NTCIR Workshop”, Patent Information Retrieval Workshop at CIKM 2008 (2008)
84
Noriko Kando
NTC intro 2008-12-16 Noriko Kando
Fujii, Utiyama, Yamamoto, and Utsuro. “Toward the Evaluation of Machine Translation Using Patent Information”, AMTA 2008
85
86
NTC intro 2008-12-16 Noriko Kando
87
NTC intro 2008-12-16 Noriko Kando
NTCIR-5 S rch t pic Performed by
S rch t pic
NTCIR 5 Patent claim Search topic in English
Search topic in Japanese
JPO applications 1993-2002 MT system Evaluation by BLEU
MT system Training data 1.8-M sentence pairs IR system
P t t i Translation in Japanese pairs Ranked doc. list
Evaluation by Mean Average Precision (MAP)
88
Precision (MAP)
NTC intro 2008-12-16 Noriko Kando
– SMT is much better for CLIR R l b d MT i d f h l ti – Rule-based MT is good for human evaluations
*SMT : a system automatically learns the translation rules from h l l
NTC intro 2008-12-16 Noriko Kando 89
the given large-scale sentence pairs.
NTC intro 2008-12-16 Noriko Kando 90
91 NTC intro 2008-12-16 Noriko Kando
NTC intro 2008-12-16 92 Noriko Kando
NTC intro 2008-12-16 Noriko Kando 93
NTC intro 2008-12-16 Noriko Kando 94
NTC intro 2008-12-16 Noriko Kando 95