e evolution of ntcir l
play

E Evolution of NTCIR: l Infrastructure of Large-Scale - PowerPoint PPT Presentation

E Evolution of NTCIR: l Infrastructure of Large-Scale Infrastructure of Large Scale Information Access Technologies Evaluation and Testing Evaluation and Testing Noriko Kando Noriko Kando National Institute of Informatics, Japan


  1. E Evolution of NTCIR: l Infrastructure of Large-Scale Infrastructure of Large Scale Information Access Technologies Evaluation and Testing Evaluation and Testing Noriko Kando Noriko Kando National Institute of Informatics, Japan http://research.nii.ac.jp/ntcir/ From November 2009: http://ntcir.nii.ac.jp/ F N b 2009 h // i ii j / kando (at) nii. ac. jp With th nks f With thanks for Tetsuya Sakai for the slides T t S k i f th slid s NTCIR@CLEF 2009-10-01 Noriko Kando 1

  2. NTCIR: NTCIR: NII Test Collection for Information Retrieval NII Test Collection for Information Retrieval Research Infrastructure for Evaluating IA Research Infrastructure for Evaluating IA A series of evaluation workshops designed to enhance research in information-access technologies by h h i i f ti t h l i b providing an infrastructure for large-scale evaluations. ■ Data sets, evaluation methodologies, and forum ■ Data sets, evaluation methodologies, and forum Project started in late 1997 7th Once every 18 months 6th 5th Data sets (Test collections or TCs) 4th 3rd Scientific, news, patents , and web 2st Chin s Chinese, Korean, Japanese, and English K r n J p n s nd En lish 1st st Tasks 0 20 40 60 80 100 # of groups # of countries IR: Cross-lingual tasks, patents, web, Geo QA : Monolingual tasks, cross-lingual tasks Summarization, trend info., patent maps Opinion analysis, text mining C Community-based Research Activities it b d R h A ti iti NTCIR-7 participants NTCIR@CLEF 2009-10-01 2 Noriko Kando 82 groups from 15 countries

  3. Tasks at past NTCIRs Tasks (Research Areas) of NTCIR Workshops ( ) p ’99 ’01 ‘02 ‘04 ‘ 05 ‘07 1st 2nd 3rd 4th 5th 6th Japanese IR news sci sci T Cross-lingual IR Cross-lingual IR T a Patent Retrieval s map/classif k k Web Retrieval s Navigational Geo Result Classification Term Extraction QuestionAnswering Info Access Dialog Summ metrics S t i Cross-Lingual Text Summarization Trend Information Opinion Analysis Noriko Kando NTCIR@CLEF 2009-10-01 3

  4. NTCIR-7 Clusters (2007 09—2008 12) NTCIR-7 Clusters (2007.09—2008.12) Inf The Cluster 1. Advanced CLIA Mu format e uST; V - Complex CLQA ( Chinese, Japanese, English) 2 - IR for QA (Chinese, Japanese, English)z nd tion Ac d Int’l Visuali Cluster 2. User-Generated : - Multilingual Opinion Analysis Multilingual Opinion Analysis ccess ( ization WS o Cluster 3. Focused Domain : Patent - Patent Translation ; English -> Japanese, P t t T sl ti ; E li h J n Chall (EVIA n Eval - Patent Mining paper -> IPC A ) luating lenge Cluster 4. MuST : - Multi-modal Summarization of Trends g Noriko Kando NTCIR@CLEF 2009-10-01 4

  5. NTCIR-8 Clusters (2008.07—2009.06) I Inform The T Advanced CLIA: - Complex CLQA ( Chinese, Japanese) - IR for QA (Chinese JapanesePar IR for QA (Chinese, JapanesePar 3 3 mation nd In GeoTime Retrieval : (English, Japanese) New nt’l WS User-Generated : Multilingual Opinion Analysis (news) Acces [Pilot] Community QA (Using Yahoo! Answer Japan) New [Pil [Pilot ? ] Multilingual Opinion Analysis (Blog) ? ] M l ili l O i i A l i (Bl ) S on E ss (EV New ? ? Focused Domain Cluster (Patent) ( ) VIA ) Evaluat - Patent Translation ; English -> Japanese, -Patent Mining paper -> IPC g p p ting -Evaluation New Registration is still Open ! You are Registration is still Open ! You are Very much Welcome to join us! Noriko Kando NTCIR@CLEF 2009-10-01 5

  6. NTCIR-7: Advanced CLIA Teruko Mitamura (CMU) Eric Nyberg (CMU) Eric Nyberg (CMU) Ruihua Chen (MSRA) Fred Gey (UCB), Donghong Ji (Wuhan Univ) Donghong Ji (Wuhan Univ) Noriko Kando (NII) Chin-Yew Lin (MSRA) Chuan-Jie Lin (Nat Taiwan Ocean Univ) Tsuneaki Kato (Tokyo Univ) Tatsunori Mori (Yokohama N Univ) Tatsunori Mori (Yokohama N Univ) Tetsuya Sakai (NewsWatch) Ad i Advisor: K.L.Kwok (Queen College) K L K k (Q C ll ) Noriko Kando NTCIR@CLEF 2009-10-01 6

  7. Complex Cross-lingual Question Answering (CCLQA) Task (CCLQA) Task Different teams can exchange Small teams that and create a and create a do not possess d t “dream-team” an entire QA QA system QA system system system can contribute IR IR and QA communities can collaborat d QA i i ll b Noriko Kando NTCIR@CLEF 2009-10-01 7

  8. CCLQA= Complex CLQA CCLQA= Complex CLQA • Moving towards Advanced Complex Questions from g p Q Factoid Questions (NTCIR-5, NTCIR-6) • 4 questions types (events biographies definitions and relationships) (events, biographies, definitions, and relationships) • Examples of Complex Questions – Definition questions : What is the Human Genome – Definition questions : What is the Human Genome Project? – Relationship questions : What is the relationship p q p between Saddam Hussein and Jacques Chirac? – Event questions : List major events in formation of E European Union. U i – Biography questions : Who is Kim Jong-Il? Noriko Kando NTCIR@CLEF 2009-10-01 8

  9. ACLIA: Evaluation EPAN tool ACLIA: Evaluation EPAN tool Noriko Kando NTCIR@CLEF 2009-10-01 9

  10. ACLIA: Evaluation EPAN tool ACLIA: Evaluation EPAN tool CCLQA: Nugget Pyramid Nugget Pyramid Automatic Evaluation Evaluation IR4QA: MAP MS nDCG Q-Measure Q Measure (preference- based ) Noriko Kando NTCIR@CLEF 2009-10-01 10

  11. Traditional “ad hoc” IR vs IR4QA Q • Ad hoc IR (evaluated using Average Precision etc ) etc.) - Find as many (partially or marginally) relevant documents as possible and put them near the documents as possible and put them near the top of the ranked list • IR4QA (evaluating using WHAT? ) IR4QA (evaluating using… WHAT? ) - Find relevant documents containing different correct answers? correct answers? - Find multiple documents supporting the same correct answer to enhance reliability of that correct answer to enhance reliability of that answer? - Combine partially relevant documents A and B Combine partially relevant documents A and B to deduce a correct answer? Noriko Kando NTCIR@CLEF 2009-10-01 11

  12. Average Precision (AP) Average Precision (AP) Pr cisi n Precision at rank r Number of Number of 1 iff doc at r 1 iff d t relevant is relevant docs • Used widely since the advent of TREC • Mean over topics is referred to as “MAP” • Mean over topics is referred to as MAP • Cannot handle graded relevance (but many IR researchers just love it) (but many IR researchers just love it) Noriko Kando NTCIR@CLEF 2009-10-01 12

  13. Persistence Q measure (Q) Q-measure (Q) Parameter β Parameter β set to 1 • Generalises AP and Blended ratio at rank r (Combines Precision handles graded relevance and normalised • Properties similar to AP Cumulative Gain) p Cumulative Gain) and higher discriminative power p Sakai and Robertson EVIA 08 S k i d R b t EVIA 08 • Not widely-used, but provides a user model has been used for QA Q for AP and Q for AP and Q and INEX as well as IR Noriko Kando NTCIR@CLEF 2009-10-01 13

  14. nDCG (Microsoft version) nDCG (Microsoft version) Sum of discounted gains for a system output f t t t Sum of discounted gains m f g • Fixes a bug of the original • Fixes a bug of the original for an ideal output nDCG • But lacks a parameter that reflects • But lacks a parameter that reflects the user’s persistence • Most popular graded-relevance metric • Most popular graded-relevance metric Noriko Kando NTCIR@CLEF 2009-10-01 14

  15. IR4QA evaluation package p g (Works for ad hoc IR in general) Computes Computes AP, Q, nDCG, RBP, NCU [Sakai and Robertson EVIA 08] and so on http://research.nii.ac.jp/ntcir/tools/ir4qa_eval-en Noriko Kando NTCIR@CLEF 2009-10-01 15

  16. • 12 participants from China/Taiwan USA Japan 12 participants from China/Taiwan, USA, Japan • 40 CS runs (22 CS-CS, 18 EN-CS) • 26 CT runs (19 CT-CT 7 EN-CT) 26 CT runs (19 CT CT, 7 EN CT) • 25 JA runs (14 JA-JA, 11 EN-JA) Monolingual Crosslingual Noriko Kando NTCIR@CLEF 2009-10-01 16

  17. Major Approaches Major Approaches • CMUJAV (CS-CS EN-CS JA-JA EN-JA) CMUJAV (CS CS, EN CS, JA JA, EN JA) - Proposes Pseudo Relevance Feedback using Lexico- Semantic Patterns (LSP-PRF) ( ) • CYUT (EN-CS, EN-CT, EN-JA) - Uses Wikipedia in several ways; post hoc results Uses Wikipedia in several ways; post hoc results • MITEL (EN-CS, CT-CT) - SMT and Baidu used for translation; data fusion SMT and Baidu used for translation; data fusion • RALI (CS-CS, EN-CS, CT-CT, EN-CT) - Uses Wikipedia in several ways; high performance Uses Wikipedia in several ways; high performance after bug fix Noriko Kando NTCIR@CLEF 2009-10-01 17

  18. Combining IR4QA &CCLQA Combining IR4QA &CCLQA IR QA F3 • EN-CS EN CS CMU CMU ATR/NiCT ATR/NiCT 0 2763 0.2763 • CS-CS KECIR Apath 0.2695 • EN-JA CMU Forst 0.2873 (CMU 0.1739) • JA JA • JA-JA BRKLY CMU 0.2611 BRKLY CMU 0 2611 Noriko Kando NTCIR@CLEF 2009-10-01 18

  19. System ranking CS by Q/nDCG vs by Q/nDCG vs that by AP CT By definition, y JA JA nDCG is more forgiving for low-recall runs w un than AP and Q. Noriko Kando NTCIR@CLEF 2009-10-01 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend