ntcir 9 kick off event ff
play

NTCIR-9 Kick-Off Event ff 2010.10.05 : 13:30- English Session: - PowerPoint PPT Presentation

Welcome! Twitter: #ntcir9 Ust: ntcir-9-kick NTCIR-9 Kick-Off Event ff 2010.10.05 : 13:30- English Session: 15:30- li h S i 30 1 Program Program About NTCIR Ab t NTCIR About NTCIR-9 Accepted Tasks


  1. Welcome! Twitter: #ntcir9 Ust: ntcir-9-kick NTCIR-9 Kick-Off Event ff 2010.10.05 日本語セッション : 13:30- English Session: 15:30- li h S i 30 1

  2. Program Program • About NTCIR Ab t NTCIR • About NTCIR-9 • Accepted Tasks • Why participate? • How to participate How to participate • Important Dates • Q & A 2

  3. About NTCIR 3

  4. NTCIR: NII Testbeds and Community for Information access Research Research Infrastructure for Evaluating IA Research Infrastructure for Evaluating IA A series of evaluation workshops designed to enhance research in information-access technologies by providing an h i i f ti t h l i b idi infrastructure for large-scale evaluations. ■ Data sets, evaluation methodologies, and forum , g , Project started in late 1997 Once every 18 months Data sets (Test collections or TCs) Scientific, news, patents, and web Chi K J d E li h Chinese, Korean, Japanese, and English Tasks (Research Areas) IR: Cross-lingual tasks, patents, web, Geo QA : Monolingual tasks, cross-lingual tasks Summarization, trend info., patent maps Opinion analysis, text mining C Community-based Research Activities i b d R h A i i i 4

  5. Information retrieval (IR) Information retrieval (IR) • Retrieve RELEVANT information from vast collection to meet users’ information needs ’ i f ti d • Using computers since the 1950s • First CS uses human assessments as success criteria • First CS uses human assessments as success criteria – Judgments vary – Comparative evaluations on the same infrastructure – Comparative evaluations on the same infrastructure Information access (IA) o at o access ( ) Whole process to make information usable by users. ex.: IR, text summarization, QA, text mining, and Q d clustering 5

  6. Tasks at Past NTCIRs NTCIR 1 2 3 4 5 6 7 8 '99'01 '02'04'05'07'08'09- User Generated Community QA ■ Contents Contents Opinion Analysis Opinion Analysis ■ ■ ■ ■ ■ ■ Cross-Lingual QA + IR ■ ■ Module-Based Geo Temporal ■ IR for Focused Patent Patent Domain Domain ■ ■ ■ ■ □ ■ ■ ■ ■ □ Complex/ Any Types ■ ■ ■ Question Dialog ■ ■ A Answering i Cross-Lingual C Li l ■ ■ ■ ■ Factoid, List ■ ■ ■ ■ Text Mining / Classification ■ ■ ■ ■ ■ Summarization / Summarization / Trend Info Visualization ■ ■ ■ Consolidation Text Summarization ■ ■ ■ Web Web ■ ■ ■ Statistical MT ■ ■ Crosslingual Cross-Lingual IR ■ ■ ■ ■ ■ ■ ■ ■ Retrieval Non English Search Non-English Search ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ Ad Hoc IR, IR for QA Text Retrieval ■ ■ ■ ■ ■ ■ ■ ■ The Years the meetings were held. The tasks started 18 months before 6

  7. Procedures in NTCIR Workshops p • Call forTask Proposals Call for Task Proposals • Selection of Task Proposals by Committee • Discussion about Experimental Designs and Evaluation Methods (can be continued to Formal Runs) b d l • Registration to Task(s) – Deliver Training Data (Documents, Topics, Answers) – DeliverTraining Data (Documents Topics Answers) • Experiments and Tuning by Each Participants – Deliver Test Data (Documents and Topics) • Experiments by Each Participants • Submission of Experimental Results • Pooling the Answer Candidates from the Submissions and • Pooling the Answer Candidates from the Submissions, and Conduct Manual Judgments • Return Answers (Relevance Judgments) and Evaluation Results • Workshop Meeting Discussion for the Next Round 7

  8. NTCIR Workshop Meeting NTCIR: Workshop Meeting http://research nii ac jp/ntcir/ http://research.nii.ac.jp/ntcir/ 8 8

  9. NTCIR-7 & -8 Program Committee Mark Sanderson, Doug Oard, Atsushi Fujii, Tatsunori Mori, Fred Gey, Noriko Kando (and EllenVoorhees Sung Hyun Myaeng Hsin Hsi Chen (and Ellen Voorhees, Sung Hyun Myaeng, Hsin-Hsi Chen, Tetsuya Sakai) 9

  10. NTCIR Test Collections Test Collections = Docs + Topics/Questions + Answers est Co ect o s ocs op cs/Quest o s s e s 10 Available to Non-participants for Research Purpose

  11. Focus of NTCIR Focus of NTCIR Lab-type IR Test yp New Challenges g Intersection of IR + NLP Asian Languages/cross-language To make information in the documents Variety of Genre more usable for users! Parallel/comparable Corpus Realistic eval/user task Interactive/Exploratory search QA t QA types at topic crea t t i Forum for Researchers and Other Experts/users Other Experts/users Idea Exchange Discussion/Investigation on Evaluation Discussion/Investigation on Evaluation methods/metrics 11

  12. IR Systems Evaluation IR Systems Evaluation • Engineering Level: Efficiency • Input Level: ex. Exhaustivity, quality, novelty of DB • Process Level: Effectiveness ex. recall, precision Process Level: Effectiveness ex. recall, precision • Output Level: Display of output • User Level: ex. Effort that users need U L l Eff t th t d • Social Level: ex. Importance (Cleverdon & Keen 1966) 12

  13. Difficulty of retrieval varies with topics J-J Level1 D auto Effectiveness across Effectiveness across TOPICS on 1.0000 101 検索システム別の11pt再現率精度 SYSTEMS system 102 103 103 104 105 1 0.8000 A 106 Average over 50 B 107 108 topics topics C C 109 D 0.8 110 0.6000 E 111 cision F 112 113 G G 0.6 pre precision 114 H 0.4000 115 I 116 J 117 0.4 118 K 119 L 0.2000 120 M 121 N 0.2 122 O 123 124 0.0000 P 125 0 2 4 6 8 0 126 0 . . . . . . 0 0 0 0 0 1 127 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 128 recall recall recall 129 13

  14. Difficulty of retrieval varies with topics J J L J-J Level1 D auto l1 D t Effectiveness across 1.0000 Effectiveness across TOPICS on 101 検索システム別の11pt再現率精度 SYSTEMS 102 system system 103 104 105 1 0.8000 “Difficult topics” vary with systems A 106 Average over 50 B 107 A 108 C J J Level1 D auto J-J Level1 D auto topics topics 109 D B 0.8 110 0.6000 E C ecision 1.0000 111 F 112 D 113 G on 0.6 0 6 pre E E 0.8000 0 8000 114 114 precisio H 0.4000 115 F I cision 116 0.6000 G J 117 ecision 0.4 118 K H 119 119 pre n av. pre L 0.2000 0.4000 I 120 M J 121 N 0.2 122 0.2000 K O 123 L L 124 124 0.0000 P Mea For reliable and stable 125 0.0000 M 0 2 4 6 8 0 126 0 evaluation, using substantial # . . . . . . 1 0 4 7 0 0 3 0 6 9 0 2 5 0 8 1 1 4 7 0 3 6 9 N 0 0 0 1 1 1 1 2 2 2 3 3 3 4 4 4 4 127 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0.1 1 1 0.2 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 of topics is necessary. 128 recall recall O Topic# Topic# Requests #101 150 Requests #101-150 129 129 P 14

  15. What are TCs usable for evaluating? g Pharmeceutical R & D Phase II : Phase III: Phase IV: Phase I: Animal experiments Animal experiments Tests with healthy Tests with healthy Clinical tests Clinical tests In vitro human subjects experiments 15

  16. What are TCs usable for evaluating? g NTCIR Users’ information-seeking Test collections tasks Phase III: Phase IV: Phase II : Phase I: Controlled Uncontrolled pre- Sharing modules, Laboratory-type interactive testing interactive testing operational testing operational testing Prototype P t t testing using human testing subjects Pharmeceutical R & D Phase II : Phase III: Phase IV: Phase I: Animal experiments Animal experiments Tests with healthy Tests with healthy Clinical tests Clinical tests In vitro human subjects experiments 2.Input level 、 6.Social level Levels of evaluation 4.User level 、 5.Output level 1. Engineering level: Efficiency 3.Process level: Effectiveness 16

  17. • Information Seeking Task g – document types + user community – user’s situation purpose of search realistic user s situation, purpose of search, realistic Experiments are Abstraction of the RealWorldTasks Experiments are Abstraction of the Real World Tasks. Trade-off between “reality” and “contorable” • Testing & Bench marking To learn how and why the system works better (worse) than others To learn how it can be improved Scientific Understanding of the effectiveness g 17

  18. Improvement of Effectiveness by Evaluation Workshops p y p 1.5 – 2 times in 3 years Cornell University TREC Systems 0.5 n Precision 0.4 TREC-1 0.3 verage P TREC-2 TREC-3 0.2 TREC 4 TREC-4 Mean Av 0.1 TREC-5 TREC-6 TREC-6 M 0 TREC-7 '92 '93 '94 '95 '96 '97 '98 System System System System System System System System System System System System System System 18

  19. Research Trends Number of Papers Presented at ACM-SIGIR 450 400 WEB 350 User Evaluation 300 Non-Text QA & Summarization pers 250 # of pap NLP NLP Cross-Lingual 200 ML Clustering g 150 150 Efficiency Filtering 100 Query Processign IR Models 50 General 0 '77-79 77 79 '80-84 80 84 '85-89 85 89 '90-94 90 94 '95-99 95 99 '00-04 00 04 '05-09 05 09 PpublicationYears 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend