DFKI at QA@Clef 2007 Gnter Neumann, Bogdan Sacaleanu, Christian - PowerPoint PPT Presentation

LT-Lab DFKI at QA@Clef 2007 Günter Neumann, Bogdan Sacaleanu, Christian Spurk, Rui Wang Language Technology Lab at DFKI Saarbrücken, Germany Clef-07 German Research Center for Artificial Intelligence

LT-Lab Overview ✩ DFKI is participating since 2003 – Focus on German monolingual QA and German/English cross- lingual QA – Promising results so far (acc.): DEDE=43,50%, ENDE=32,98%, DEEN=25.50% ✩ Goal for Clef 2007: increase spectrum of activities – Consideration of additional language pairs (ESEN, PTDE) – Participation in QAST pilot task – Participation in Answer Validation Exercise (AVE) Clef-07 German Research Center for Artificial Intelligence

LT-Lab QA architecture – some design issues ✩ NL question – Declarative description of search strategy and control information – Analysis should be as complete and accurate as possible – Use of full parsing and semantic constraints ✩ Consider document sources as implicit search space – Off-line: Provide question type oriented preprocessing for context selection – On-line: Provide question specific preprocessing for answer processing Clef-07 German Research Center for Artificial Intelligence

LT-Lab Common architecture for different answer pools ✩ Answer sources (covered by our technology) – Structured sources (DBMS) – Linguistically well-formed textual sources (news articles) – Well-structured web sources (Wikipedia) – Web snippets – Speech transcripts, cf. QAST ✩ Assumption: – QA for different answer sources share pool of same components ✩ Service oriented architecture (SOA) for QA – Strong component-oriented approach – Basis for open-source QA architecture (cf. EU project QALL-ME) Clef-07 German Research Center for Artificial Intelligence

LT-Lab Overview QA architecture Clef-Corpus Cross-linguality Cross-linguality Before Method After Method Wikipedia- Speech Corpus transcripts Retrieval Component IR-Queries Extraction Component Sentences Analysis Strategy Selector Strings Component Q-Objects QA-Controller Possible Answers Answers Validation Selection Component Component Clef-07 German Research Center for Artificial Intelligence

LT-Lab System Architecture for Clef 2007 Clef-07 German Research Center for Artificial Intelligence

LT-Lab Query processing components Clef-07 German Research Center for Artificial Intelligence

Assumption: the better LT-Lab Cross-lingual Approach to ODQA the query analysis of a translated question is Before Method done the better was the Completeness wrt. translation being made - Parse tree • Question translation - major semantic Wh-types • Translations processing -> QObjects • QObject selection Confidence Best QO Selection Source Question (DE/EN/ES/PT) QO1 QO2 QO3 Possibly Via English External German/English MT services Wh-parser Answer Proc German/English Questions Q1,Q2,Q3 Clef-07 German Research Center for Artificial Intelligence

LT-Lab Question analysis SMES for IA-schema (translated) SMES for DE&EN •Wh-attachment •Generated Wordforms •Morphology •Q-type, A-type, Q- •NE-types/Concepts NL questions •Dependency trees focus •Weights •Shallow&Deep Proc. IA Topic Syntactic Semantic proto query processing analysis analysis construction LingPipe for Q-Object •NER •Coreference Sequence of Resolution NE resolved Wh-questions IA proto query Clef-07 German Research Center for Artificial Intelligence Information access

LT-Lab Ouput example of query analysis Exploiting Which Jewish painter lived from 1904-1944? Natural Language Generation <QOBJ msg="quest" id="qId0" lang="DE" score="1"> IA query created for Lucene <NL-STRING id="qId0"> <SOURCE id="qId0" lang="DE">Welche juedischen +neTypes:NUMBER Maler lebten von 1904-1944?</SOURCE> <TARGETS/> AND </NL-STRING> ("lebten" OR "lebte" OR "gelebt" <QA-control> OR "leben" OR "lebt") <Q-FOCUS>Maler</Q-FOCUS> <Q-SCOPE>leb</Q-SCOPE> AND +maler^4 <Q-TYPE restriction="TEMP">C-COMPLETION</Q- AND jüdisch^1 TYPE> <A-TYPE type="list:SOME">NUMBER</A-TYPE> AND 1944^1 </QA-control> AND 1904^1 <KEYWORDS> <KEYWORD id="kw0" type="UNIQUE"> <TK pos="V" stem="leb">lebten</TK> </KEYWORD> <KEYWORD id="kw1" type="UNIQUE"> <TK pos="A" stem="juedisch">juedischen</TK> … </KEYWORD> </KEYWORDS> <EXPANDED-KEYWORDS/> <NE-LIST> <NE id="ne0" type="DATE">1944</NE> <NE id="ne1" type="DATE">1904</NE> </NE-LIST> Clef-07 </QOBJ> German Research Center for Artificial Intelligence

LT-Lab Answer processing components Clef-07 German Research Center for Artificial Intelligence

LT-Lab Experiments & Results Performance still ok although some lost Right W X U Run ID # % # # # Coverage problems of 60 30 121 14 5 dfki061dede M English Wh-parser 37 18.5 144 18 1 dfki061ende C BUG in NE-Informed 14 7 178 6 2 Translation (used DE- dfki061deen C based recognizer) 10 5 180 10 0 dfki062esen C Problems with MT online services (PT-EN-DE) 5 2.5 189 4 2 dfki062ptde C Clef-07 German Research Center for Artificial Intelligence

LT-Lab Remarks ✩ Online MT services are still insufficient – Develop own MT solutions, cf. EU project EuroMatrix ✩ Bad coverage of our English Wh-parser – First prototype for Clef 2007 ✩ Answer extraction currently robust enough for different answer sources – Similar performance for newspaper and Wikipedia ✩ Need more semantic analysis on answer side without lost of coverage and domain-independency – We are exploring cognitive semantics (cf. Talmy, 1987) ✩ Number of QA components also used in QAST pilot task and AVE Clef-07 German Research Center for Artificial Intelligence

LT-Lab DFKI at QAST and AVE Result (encouraging) ✩ QAST pilot task Task #Q #A MRR ACC – For given written factoid T1 98 19 0.17 0.15 question T2 98 9 0.09 0.09 – Extract answer from manual or T1 = Chill corpus manual automatic speech transcripts T2 = Chill corpus automatic ✩ Answer Validation Exercise Result (really encouraging) – Given a triple of form (question, Runs Recall Precis F- QA answer, supporting text) ion measu Accur re acy – Decide whether the answer to dfki07- 0.62 0.37 0.46 0.16 the question is correct and run1 – Is supported or not according to dfki07- 0.71 0.44 0.55 0.21 the given supporting text run2 Clef-07 German Research Center for Artificial Intelligence

LT-Lab DFKI at QAST pilot task Goals ✩ – Get experience with this sort of answer sources – Adapt our text–based open–domain QA system that we used for the Clef main tasks – Since QAST required different set of expected answer types we developed a federated search strategy for NER called Meta-NER Same core as DFKI our textual QA system Clef-07 German Research Center for Artificial Intelligence

LT-Lab META-NER ✩ Call several NER in parallel ✩ Merge results by a voting strategy BiQueNER developed by our group. Extends co-training algorithm of Collins and Singer: 1. Chunks only instead of full parsing 2. Use of typed Gazetters and rules. Clef-07 German Research Center for Artificial Intelligence

LT-Lab DFKI’s AVE System ✩ AVE System is based on our RTE system (cf. Wang & Neumann, AAAI-2007, RTE-3 challenge) ✩ RTE method already demonstrated good results for QA task – RTE-3 (only QA): 81.5 %, Trec-2003 QA: 65.7 % ✩ RTE Method: Novel sentence level Kernel method – Subtree alignment on syntactic level • Check similarity between tree of H and relevant subtree in T – Subsequence kernel • Consider all possible subsequence of spine (path) of difference pairs • SVM for classification Clef-07 German Research Center for Artificial Intelligence

LT-Lab AVE architecture Runs R P F QA Acc. run1 0.62 0.37 0.46 0.16 Clef-07 run2 0.71 0.44 0.55 0.21 German Research Center for Artificial Intelligence

LT-Lab Error Analysis ✩ Supporting text from web documents cause parsing problems ✩ Violation of some of our RTE system’s assumptions – Required: H should be “verbally” smaller than T – Violated by: Q-A made patterns are too long – impact on recall ✩ If supporting text is very long (a complete document) then our RTE system is misleaded – Impact on precision Clef-07 German Research Center for Artificial Intelligence

LT-Lab Thanks! Clef-07 German Research Center for Artificial Intelligence

DFKI at QA@Clef 2007 Gnter Neumann, Bogdan Sacaleanu, Christian - PowerPoint PPT Presentation

LT-Lab DFKI at QA@Clef 2007 Gnter Neumann, Bogdan Sacaleanu, Christian Spurk, Rui Wang Language Technology Lab at DFKI Saarbrcken, Germany Clef-07 German Research Center for Artificial Intelligence LT-Lab Overview DFKI is

Cross-Language Evaluation Forum What happened at CLEF 2003 From CLEF 2003 to CLEF 2004

CLEF-HIPE-2020 Named Entity Recognition and Linking on Historical Newspapers 1 CLEF-HIPE-2020

Text Classification using Weka Jrg Steffen, DFKI Substitute Gnter Neumann, DFKI

Towards Cross-Media Information Extraction Thierry D Thierry Declerck, clerck, DFKI DFKI

Speech synthesis Marc Schrder, DFKI schroed@dfki.de 20 January 2010 What is text-to-speech

Speech synthesis Marc Schrder, DFKI schroed@dfki.de 28 January 2009 What is text-to-speech

Speech synthesis Marc Schrder, DFKI schroed@dfki.de 06 February 2008 What is text-to-speech

Neuchatel at NTCIR-4 From CLEF to NTCIR Jacques Savoy University of Neuchatel, Switzerland

C lt Cultural Heritage in CLEF (CHiC) 2012 l H it i CLEF (CHiC) 2012 Pilot Lab Overview

CLEF: 15 Years of IR Evaluation in Europe Nicola Ferro University of Padua, Italy Forum

CLEF 20 th Anniversary Nicola Ferro @frrncl University of Padua, Italy 10 th Conference and Labs

CLEF eHealth 2020 @clefehealth CLEF eHealth 2020 Task 1: Multilingual Information Extraction

Search Snippet Evaluation Mikhail Lebedev, Pavel Braslavski, Denis Savenkov CLEF 2011 CLEF 2011

CLEF and P CLEF and P PROMISEs PROMISEs Nicola a Ferro Information Management Sys

Grid@CLEF Track Overview Donna Harman Nicola Ferro NIST, USA University of Padua, Italy

Shallow Text Generation Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrcken

Stencil-like operations on unstructured meshes wissen leben Christian Engwer 13.04.2015, WWU

Model reduction for multiscale problems Mario Ohlberger wissen leben Dec. 12-16, 2011 RICAM,

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

EVANGELISATION THROUGH MARRIAGE AND FAMILY Dr Brigid McKenna Director, Office of Life Marriage

Opinion Mining Exercises Feiyu Xu DFKI December 14, 2011 12/20/11 Language Technology I 1

Organic Compounds in Water and Wastewater Oil Spill Cleanup and Surfactant Use Kristie

LAVA: Large-scale Automated Vulnerability Addition Tim Leek, Patrick Hulin, Ryan Whelan (MIT/LL),

Combinatorics of spoke systems for Frchet-Urysohn points Robert Leek Cardiff University, UK