Overview of the ACLIA IR4QA (Information Retrieval for IR4QA - PowerPoint PPT Presentation

Overview of the ACLIA IR4QA (Information Retrieval for IR4QA (Information Retrieval for Question Answering) Task Tetsuya Sakai Noriko Kando Chuan-Jie Lin Ch Ji Li Teruko Mitamura T k Mit Donghong Ji Kuang-Hua Chen E i N b Eric Nyberg 18 th December 2008 @NTCIR-7, Tokyo

TALK OUTLINE TALK OUTLINE 1. Task Objectives 2 2. Relevance Assessments Relevance Assessments 3. Evaluation Metrics 4. Participating Teams 5 5. Official Results Official Results 6. Lazy Evaluation 7. Unanswered Questions

What are the effective IR techniques for QA?

Traditional “ad hoc” IR vs IR4QA • Ad hoc IR (evaluated using Average Precision etc.) - Find as many (partially or marginally) relevant Find as many (partially or marginally) relevant documents as possible and put them near the top of the ranked list the ranked list • IR4QA (evaluating using… WHAT? ) - Find relevant documents containing different correct Find relevant documents containing different correct answers? - Find multiple documents supporting the same correct Find multiple documents supporting the same correct answer to enhance reliability of that answer? - Combine partially relevant documents A and B to Combine partially relevant documents A and B to deduce a correct answer?

Pooling for relevance assessments System 1 Pool Topic A depth Run 1 Run >= 30 depth depth Target =1000 Relevance Documents : assessments : L2-relevant CS: Simplified : Pool L1-relevant Chinese Chinese L0 L0 CT: Traditional System N Chinese L2: relevant L2 l t JA: Japanese Pool L1: partially relevant depth Run N Run >= 30 L0: judged L0: judged d depth th nonrelevant =1000

Different pool depths for different topics Mandatory for all topics Assess depth 30 pool Assess depth-30 pool Assess depth-50 pool (minus depth-30 pool) Assess depth 50 pool (minus depth 30 pool) Assess depth 70 pool (minus depth 50 pool) Assess depth-70 pool (minus depth-50 pool) See IR4QA Overview A Assess depth-90 pool (minus depth-70 pool) d th 90 l ( i d th 70 l) Tables 29-31 for details Assess depth-100 pool (minus depth-90 pool) Relevance assessments coordinated independently by Relevance assessments coordinated independently by Donghong Ji (CS), Chuan-Jie Lin (CT) and Noriko Kando (JA)

Sorting the pooled documents for assessors • Traditional approach: Docs sorted by IDs • IR4QA approach: Sort docs in depth-X pool IR4QA approach: Sort docs in depth X pool by: - # #runs containing the doc at or above rank X t i i th d t b k X (primary sort key) - Sum of ranks of the doc within these runs (secondary sort key) (secondary sort key) Present ``popular’’ documents first!

Assumptions behind the sort Assumptions behind the sort 1. Popular docs are more likely to be relevant than p y others. Supported by [Sakai and Kando EVIA 08] Supported by [Sakai and Kando EVIA 08] 2. If relevant docs are concentrated near the top of the list to be assessed this is easier for the the list to be assessed, this is easier for the assessors to judge more efficiently and consistently . At NTCIR 2 th At NTCIR-2, the assessors actually did not like doc lists t ll did t lik d li t sorted by doc IDs (But we need more empirical evidence)

Average Precision (AP) Average Precision (AP) P Precision i i at rank r Number of Number of 1 iff d 1 iff doc at r t relevant is relevant docs • Used widely since the advent of TREC • Mean over topics is referred to as “MAP” • Mean over topics is referred to as MAP • Cannot handle graded relevance (but many IR researchers just love it) (but many IR researchers just love it)

Persistence Q measure (Q) Q-measure (Q) Parameter β Parameter β set to 1 • Generalises AP and Blended ratio at rank r (Combines Precision handles graded relevance and normalised • Properties similar to AP p Cumulative Gain) Cumulative Gain) and higher discriminative power p S k i Sakai and Robertson EVIA 08 d R b t EVIA 08 • Not widely-used, but provides a user model has been used for QA for AP and Q for AP and Q and INEX as well as IR

nDCG (Microsoft version) nDCG (Microsoft version) Sum of discounted gains f for a system output t t t Sum of discounted gains g • Fixes a bug of the original • Fixes a bug of the original for an ideal output nDCG • But lacks a parameter that reflects • But lacks a parameter that reflects the user’s persistence • Most popular graded relevance metric • Most popular graded-relevance metric

IR4QA evaluation package (Works for ad hoc IR in general) Computes Computes AP, Q, nDCG, RBP, NCU [Sakai and Robertson EVIA 08] and so on http://research.nii.ac.jp/ntcir/tools/ir4qa_eval-en

• 12 participants from China/Taiwan USA Japan 12 participants from China/Taiwan, USA, Japan • 40 CS runs (22 CS-CS, 18 EN-CS) • 26 CT runs (19 CT-CT 7 EN-CT) 26 CT runs (19 CT CT, 7 EN CT) • 25 JA runs (14 JA-JA, 11 EN-JA) Monolingual Crosslingual

Oral presentations Oral presentations • RALI (CS-CS, EN-CS, CT-CT, EN-CT) RALI (CS CS, EN CS, CT CT, EN CT) - Uses Wikipedia to extracts cue words for BIOGRAPHY; Extracts person names using Wikipedia and Google; Uses Google translation G G • CYUT (EN-CS, EN-CT, EN-JA) - Uses Wikipedia for query expansion and translation; U Wiki di f i d l i Uses Google translation • MITEL (EN CS CT CT) • MITEL (EN-CS, CT-CT) - Uses SMT and Baidu for translation; data fusion • CMUJAV (CS CS EN CS JA JA EN JA) • CMUJAV (CS-CS, EN-CS, JA-JA, EN-JA) - Proposes Pseudo Relevance Feedback using Lexico- Semantic Patterns (LSP-PRF) Semantic Patterns (LSP PRF)

Other interesting approaches Other interesting approaches • BRKLY (JA-JA) A very experienced TREC/NTCIR participant • HIT (EN-CS) PRF most successful • KECIR (CS-CS) Query expansion length optimised for each question type (definition, biography…) (d fi iti bi h ) h ti t • NLPAI (CS-CS) Uses question analyses files from other teams (next slide) teams (next slide) • NTUBROWS (CT-CT) Query term filtering, data fusion • OT (CS-CS CT-CT JA-JA) Data fusion-like PRF OT (CS CS, CT CT, JA JA) Data fusion like PRF • TA (EN-JA) SMT document translation from NTCIR-6 • WHUCC (CS-CS) Document reranking ( ) g Please visit the posters of all 12 IR4QA teams! Please visit the posters of all 12 IR4QA teams!

NLPAI (CS-CS) used question analysis files from other teams. CSWHU CSWHU-CS-CS-0 CS-CS-01-T: <KEYTERMS> <KEYTERM SCORE="1.0"> 宇宙大爆炸 </KEYTERM> Different teams <KEYTERM SCORE="0.3"> 理论 </KEYTERM> </KEYTERMS> come up with Apath-CS-CS-01-T Apath-CS-CS-01-T: <KEYTERMS> different set of <KEYTERM SCORE="1.0"> 宇宙大爆炸理论 </KEYTERM> query terms with i h </KEYTERMS> /KEYTERMS CMUJA UJAV-CS -CS-CS CS-01-T -01-T: different weights. <KEYTERMS> <KEYTERM SCORE="1.0"> 宇宙 </KEYTERM> This clearly affects This clearly affects <KEYTERM SCORE="1.0"> 大 </KEYTERM> KEYTERM SCORE 大 /KEYTERM <KEYTERM SCORE="1.0"> 爆炸 </KEYTERM> retrieval <KEYTERM SCORE="1.0"> 理论 </KEYTERM> p performance. <KEYTERM SCORE="1.0"> 宇宙大爆炸理论 </KEYTERM> <KEYTERM SCORE="1.0"> 宇宙大爆炸理论 </KEYTERM> KEYTERM SCORE "1 0" 宇宙大爆炸理论 /KEYTERM <KEYTERM SCORE="1.0"> 宇宙大爆炸 </KEYTERM> <KEYTERM SCORE="1.0"> 宇宙大爆炸 </KEYTERM> </KEYTERMS> Special thanks to Special thanks to Maofu Liu (NLPAI)

CS T-runs: Top 3 teams CS T runs: Top 3 teams Mean Mean Mean AP Q nDCG OT- O .6337 633 OT- O .6490 6 90 OT- O 8270 * .8270 CS-CS-04-T CS-CS-04-T CS-CS-04-T MITEL MITEL- .5959 5959 MITEL- MITEL .6124 6124 CMUJAV- CMUJAV .7951 7951 EN-CS-03-T EN-CS-03-T CS-CS-02-T CMUJAV- CMUJAV .5930 5930 CMUJAV CMUJAV- .6055 6055 MITEL MITEL- .7949 7949 CS-CS-02-T CS-CS-02-T EN-CS-01-T - MITEL is very good even though it is a crosslingual run MITEL i d th h it i li l - OT significantly outperforms CMUJAV with Mean nDCG (two-sided bootstrap test; α =0 05) (two-sided bootstrap test; α =0.05) - nDCG disagrees with AP and Q

Overview of the ACLIA IR4QA (Information Retrieval for IR4QA - PowerPoint PPT Presentation

Overview of the ACLIA IR4QA (Information Retrieval for IR4QA (Information Retrieval for Question Answering) Task Tetsuya Sakai Noriko Kando Chuan-Jie Lin Ch Ji Li Teruko Mitamura T k Mit Donghong Ji Kuang-Hua Chen E i N b Eric

Are Popular Documents More Likely To Be Relevant? A Dive into the ACLIA IR4QA Pools Tetsuya

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

Information Retrieval Introducing Information Retrieval and Web Search

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Accessing XML content: An information retrieval perspective Mounia Lalmas mounia@acm.org 1

Information Retrieval CS-7961: Topics in Information retrieval (IR) is finding material (usually

Information Retrieval Modeling Russian Summer School in Information Retrieval Djoerd Hiemstra

INFORMATION RETRIEVAL USING NEURAL NETWORKS VINEETH REDDY ANUGU CMSC 676 INFORMATION RETRIEVAL

Shape from X Haoqiang Fan fhq@megvii.com Some figures adapted from

Tips and Tricks to Render Images of Biomolecules in VMD Joo V. Ribeiro

Lecture 7: Depth/Occlusion Information Visualization CPSC 533C, Fall 2006 Tamara Munzner UBC

2D Imaging and Transformation Sung-Eui Yoon ( ) ( ) C Course URL: URL

Checkpoint 1 Survey Said Latest Developments / Looking Ahead Come on, we can do

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

ECE 2574: Data Structures and Algorithms - Queue ADT C. L. Wyatt Today we will look at the Queue

ECE 2574: Data Structures and Algorithms - Deque Implementation C. L. Wyatt Today we will look

Overview of the ACLIA IR4QA (Information Retrieval for IR4QA - PowerPoint PPT Presentation

Overview of the ACLIA IR4QA (Information Retrieval for IR4QA (Information Retrieval for Question Answering) Task Tetsuya Sakai Noriko Kando Chuan-Jie Lin Ch Ji Li Teruko Mitamura T k Mit Donghong Ji Kuang-Hua Chen E i N b Eric

Are Popular Documents More Likely To Be Relevant? A Dive into the ACLIA IR4QA Pools Tetsuya

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

Information Retrieval Introducing Information Retrieval and Web Search

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

Retrieval Models: Outline CS490W: Web I nformation Search &amp; Management Retrieval Models

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Accessing XML content: An information retrieval perspective Mounia Lalmas mounia@acm.org 1

Information Retrieval CS-7961: Topics in Information retrieval (IR) is finding material (usually

Information Retrieval Modeling Russian Summer School in Information Retrieval Djoerd Hiemstra

INFORMATION RETRIEVAL USING NEURAL NETWORKS VINEETH REDDY ANUGU CMSC 676 INFORMATION RETRIEVAL

Shape from X Haoqiang Fan fhq@megvii.com Some figures adapted from

Tips and Tricks to Render Images of Biomolecules in VMD Joo V. Ribeiro

Lecture 7: Depth/Occlusion Information Visualization CPSC 533C, Fall 2006 Tamara Munzner UBC

2D Imaging and Transformation Sung-Eui Yoon ( ) ( ) C Course URL: URL

Checkpoint 1 Survey Said Latest Developments / Looking Ahead Come on, we can do

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

ECE 2574: Data Structures and Algorithms - Queue ADT C. L. Wyatt Today we will look at the Queue

ECE 2574: Data Structures and Algorithms - Deque Implementation C. L. Wyatt Today we will look

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models