 
              Overview of the Sixth NTCIR Workshop Noriko Kando National Institute of Informatics http://research.nii.ac.jp/ntcir/ kando (at) nii. ac. jp ntcir6 2006-05-16 Noriko kando 1
NTCIR Workshop is : A series of evaluation workshops designed to enhance research in information access technologies by providing infrastructure of large-scale evaluation. Project started late 1997, Once per 1½ years 1 st : Nov.1,1998- Sept.1,1999 2 nd : June,2000– March,2001 3 rd : Sept 2001- Oct 2002 4 th : Apr 2003 – June 2004 5 th : Oct 2004 – Dec 2005 6 th: April 2006 – June 2007 * N ii T est C ollection for I nformation R etrieval systems ntcir6 2006-05-16 Noriko kando 2
Focus of NTCIR New Challenges Lab-type IR Test Intersection of IR + NLP Asian Languages/cross-language To make information in the Variety of Genre documents more usable for Parallel/comparable Corpus users! Realistic eval/user task Forum for Researchers Idea Exchange Discussion/Investigation on Evaluation methods/metrics ntcir6 2006-05-16 Noriko kando 3
Tasks (Research Areas) of NTCIRs Tasks (Research Areas) of NTCIR Workshops 2nd 3rd 5th 1st 4th 6th Japanese IR news sci Cross-lingual IR T a Patent Retrieval s map/classif k Web Retrieval s Navigational Geo Result Classification Term Extraction QuestionAnswering Info Access Dialog Summ metrics Cross-Lingual Text Summarization Trend Information Opinion Analysis ntcir6 2006-05-16 Noriko kando 4
NTCIR-6 (Mtg: May 15-18, 2007) • CLIR: multi-collection. NTC3-5, news docs,CJK • CLQA: E-C, C-C, C-E, E-J, J-J. J-E (factoid) • Opinion: CJE, reuse NTC3-5 CLIR • Patent Retrieval: – Invalidity Search, 10 yr patent fulltext ca90GB – Text Categorization to F-terms (good granularity for patent map axis) • QAC: Every kind of Qs (J-J), eval by BE • [Pilot] Must: MUltimodal Summarization for Trend information, extract numeric information from a set of documents, and visualize them to show their trends ntcir6 2006-05-16 Noriko kando 5
NTCIR-6 Schedule Task Lang Formal Run Meeting CLIR CKJ Done CLQA CJE Nov 1-7, 2006 May 15-18, Opinion CJE late Dec. 2007 Patent(IR,CL) JE Oct 2006 QA J Sept25-Oct20, 2006 Trend Info J Dec 2006 ( March 2007) (MuST) ntcir6 2006-05-16 Noriko kando 6
NTCIR workshop: Number of Participating Groups registered 10 4 6th Workshop 12 85 5th workshop 77 15 4th workshop 74 10 3rd workshop 65 9 #of registered 2st workshop 36 8 # of groups 1st workshop 28 6 # of countries 0 20 40 60 80 100 ntcir6 2006-05-16 Noriko kando 7
Number of Active Participants by Tasks Number of Participants by Tasks Opinion Opinion 120 120 CLQA CLQA # of ParticipatingGroups # of ParticipatingGroups 100 100 QA QA MuST MuST 80 80 Summarization Summarization 60 60 Term Extraction Term Extraction Chinese Web Retrieval Web Retrieval 40 40 Chinese Korean Patent Retrieval Patent Retrieval J � E,E � J 、 J � E x � CJEK E � C 20 20 NonJapanese IR NonJapanese IR CLIR CLIR 0 0 Japanese IR Japanese IR ) ) ) ) ) ) ) ) ) ) ) ) 2 4 5 7 9 1 2 4 5 9 1 7 - - - - - - - - - - - 0 - 1 3 4 6 8 0 6 8 1 3 4 0 0 9 0 0 0 0 0 0 0 0 9 0 0 0 0 9 0 0 0 0 0 0 9 2 2 2 2 1 2 2 2 1 2 2 2 ( ( ( ( ( ( ( ( ( ( ( ( t d h h t d d h h s d s n t t h n r t t 1 h 2 r 4 5 1 2 3 4 5 t 3 t 6 6 ntcir6 2006-05-16 Noriko kando 8
Active Participants [CLIR] [MuST] [PATENT] Academia Sinica Chinese Academy of Sciences (ISCAS) Hiroshima City Univ Hiroshima City Univ Huazhong Normal Univ Justsystem Corporation Hitachi; Ltd Hummingbird Keio Univ (saito) Justsystem Corporation Institute for Infocomm Research Mie Univ Nagaoka Univ of Technology Justsystem Corporation NICT NICT National Central Univ NEC (Internet Systems Research National Taiwan Normal Univ NICT Labs) NTT DATA National Taiwan Normal Univ Ochanomizu Univ (2 groups) NTT-CS Newswatch, Co. Okayama Univ POSTECH Osaka Kyoiku Univy Osaka Prefecture Univ (3 groups) Toyohashi Univ of Technology (aono) POSTECH Ritsumeikan Univ Univ of Sheffield Queens College Tokyo Denki Univ Univ of Tsukuba Queensland Univ of Technology Tokyo Institute of Technology Toshiba / NewsWatch Tokyo Metropolitan Univ [QAC] Univ of Aizu Univ of Tokyo (kato) Aoyama Gakuin Univ Univ of California; Berkeley Yokohama National Univ Carnegie Mellon Univ Univ of Montreal Hokkaido University (araki) Univ of Neuchatel [OPINION] Chinese Academy of Sciences (ISCAS) Univ of Nottingham Cornell Univ NTT-CS Yahoo! Japan Illinois Institute of Technology Ritsumeikan Univ Information and Communications Univ Toyohashi Univ of Technology (akiba) [CLQA] Chinese Academy of Sciences (ISCAS) Yokohama National Univ Aoyama Gakuin Univ National Chiao Tung Univ Carnegie Mellon Univ National Institute of Informatics Chinese Academy of Sciences (ICT) NICT Academia Sinica 15 new commers NEC (Internet Systems Research Mount Holyoke College Labs) National Central Univ (13 are Chinese Univ of Hong Kong National Cheng Kung Univ Toyohashi Univ of Technology (seki) Queens College international) Univ of Maryland State Univ of New York at Albany Univ of Sheffield Tokyo Institute of Technology (Furui) ntcir6 2006-05-16 Noriko kando 9 Many returns Toyohashi Univ of Technology (Akiba) Yokohama National Univ
Geographical Distribution of Participants ntcir6 2006-05-16 Noriko kando 10
Geographical Distribution of Active Participants Canada Ireland USA Switzerland UK China PRC Hong Kong Japan Korea Singapore Taiwan ROC Australia ntcir6 2006-05-16 Noriko kando 11
What were New to NTCIR-4 - Open Submission Session - ACM-TALIP Special Issue Recommendation - Open Attendance - Submission Raw Data - Online Working Notes and Slides ntcir6 2006-05-16 Noriko kando 12
What’s New to NTCIR-5 - Open Submission >>>> continued - Special Issue on Patent at IP&M - Open Attendance >>>>continued - Submission Raw Data >>>>continued - Online Proceedings and Slides >>>> Proceedings Only (No working notes) - Pilot: MuST ntcir6 2006-05-16 Noriko kando 13
What’s New to NTCIR-6 - Open Submission >>>> enhanced to EVIA - Special Issue on Patent (IP&M) published - Open Attendance >>>>continued - Submission Raw Data >>>> part of participants’ dataset - Online Proceedings and Slides >>>>continued + Proceedings Only (No working notes)>>continued + Publisher’s version (page # and running title) + CD contains draft papers. - Pilot: MuST, Opinion - Multiple Collections (CLIR, PATENT) ntcir6 2006-05-16 Noriko kando 14
Multiple TCs • For more stable/robust evaluation • Improvements from previous years – CLIR, Patent IR (using NTCIR-3,-4,-5) • For larger test sets with reasonable/manageable work – Patent IR (Using NTCIR-3,-4,-5,-6 collections) Need Large # of topics, but limited resources - 34 topics: Rel Judgments by Human Experts - x K topics of judgments by external searchers - x 10K topics of judgments by patent examinars (a few relevant doc per topic) Similar to Click Thro on Web. ntcir6 2006-05-16 Noriko kando 15
NTCIR Workshop 6 +CLIR Hsin-Hsi Chen, NTU Kuang-hua Chen, NTU (2006-2007) Kazuaki Kishida, Surugadai U Kazuko Kuriyama, Shirayuri U Organizers Sukhoon Lee, NCU +CLQA Kuang-hua Chen, NTU Chuan-Jie Lin , Nat Taiwan Ocean U Yutaka Sakaki, ATR +OPINION Hsin-His Chen, NTU David K Evans, NII LunWei Ku, NTU Chin-Yew Lin, Microsfot Research Asia Yohei Seki, Toyohashi U Tech, +PATENT Atsushi Fujii, Tsukuba U Makoto Iwayama, Hitachi/TITEC +QA Junichi Fukumoto, Ritsumeikan U Tsuneaki Kato, U Tokyo Fumito Masui, Mie U Tatsunori Mori, Yokohama nat U. Program chair : Noriko Kando, NII +MuST [piloy eotkdhop] Tsuneaki Kato, Tokyo Univ Mitsuteru Matsushita, NTT ntcir6 2006-05-16 Noriko kando 16
Acknowledgment Korea Economic Daily • Central Daily News Linguistic Data Consortium • China Daily News Mainichi Newspaper Nippon Database Kaihatsu, Co. • China Times Inc. Ltd. • Chosunilbo NTT • Hankooki.com NRI Cyber Patent PATOLIS • Industrial Property the Sing Tao Group Cooperation Center Taiwan News • Japan Parent Office Tokyo Univ • Japan Patent UDN.COM Information Wisers Information Ltd. Organization Yomiuri Shinbun ntcir6 2006-05-16 Noriko kando 17
Cross-Language Information Retrieval (CLIR) Task Task Organizers Kazuaki Kishida*, Kuang-hua Chen, Sukhoon Lee, Hsin-Hsi Chen, Noriko Kando, Kazuko Kuriyama,
Design of NTCIR-6 CLIR Task • STAGE1: ad hoc retrieval on multilingual IR (MLIR), bilingual IR (BLIR), and single language IR (SLIR) • STAGE2: cross-collection analysis using old test collections from NTCIR-3 to -5. >>>A New Challenge – Purpose: To obtain the more reliable – Run the same system across 3 TCs ntcir6 2006-05-16 Noriko kando 19
Evaluation • Measures – Official: trec_eval • Mean average precision (MAP), R-precision, Recall-Precision graph, B-pref etc. – Add: multi-grade relevance based metrics • nDCG, Q-measure, Generalized average precision (GAP) ntcir6 2006-05-16 Noriko kando 20
Recommend
More recommend