LanguageIndependent LanguageIndependent AnswerPrediction - - PowerPoint PPT Presentation

language independent language independent answer
SMART_READER_LITE
LIVE PREVIEW

LanguageIndependent LanguageIndependent AnswerPrediction - - PowerPoint PPT Presentation

LanguageIndependent LanguageIndependent AnswerPrediction AnswerPrediction fromtheWeb fromtheWeb AlejandroFigueroa&G nterNeumann nterNeumann AlejandroFigueroa&G


slide-1
SLIDE 1

LanguageIndependent LanguageIndependent AnswerPrediction AnswerPrediction fromtheWeb fromtheWeb

AlejandroFigueroa&G AlejandroFigueroa&Gü ünterNeumann nterNeumann

LanguageTechnologyLaboratory LanguageTechnologyLaboratory GermanResearchCenterforArtificialIntelligence GermanResearchCenterforArtificialIntelligence DFKI, DFKI,Saarbr Saarbrü ücken cken

slide-2
SLIDE 2

Motivation: Motivation:From From Search Search Engines Engines to toAnswer Answer Engines Engines

UserQuery: KeyWrds,Wh/Clause,Q/Text Search Engines Userstillcarries the majorefforts in understanding Answer Engine Shift more „interpretation effort“ tomachines Experienc ed/based Interactiv e QA cycles

slide-3
SLIDE 3

Web Web/ /basedQuestionAnswering:SOA basedQuestionAnswering:SOA

NL-Question Analysis Search Engine Document processing Query Refinement Answer Extraction NL-Question Exact answ 1 Exact answ 2 …

Problematic: NL-question-driven Query Refinement Usually large set of snippets/documents have to be consulted

IR-Query Fetched documents Snippets

Q-type, a-type, NL-query

NL-Query

e.g., AskMRS, Mulder, NSIR, SmartWeb, …

slide-4
SLIDE 4

Web Web/ /basedQuestionAnswering:OurApproach basedQuestionAnswering:OurApproach

NL-Query Analyse Search Engine Document processing Answer Extraction NL-Question Exact answ 1 Exact answ 2 … NL-string Fetched documents Snippets

Q-type, a-type, NL-query

Answer Prediction AP-phrases

slide-5
SLIDE 5

Answerprediction... Answerprediction...

… identifiesandextractsthosesubstrings(calledtheAP identifiesandextractsthosesubstrings(calledtheAP/ /phrases) phrases) fromasetofsnippets/textfragments fromasetofsnippets/textfragments

– – WhichareeitherparaphrasesofsubstringsoftheNLquestion,o WhichareeitherparaphrasesofsubstringsoftheNLquestion,or r – – Whichcontainexactanswerstrings Whichcontainexactanswerstrings – – E.g E.g., .,„Who is the primerminister ofGreatBritain ?“

  • Possible

Possible AP AP/ /phrases phrases: :„UnitedKingdom“,„ „TonyBlair“

  • Notice:

Notice:

– – Often,thefirstN Often,thefirstN/ /snippetsdonotcontaintheanswerandevenpointto snippetsdonotcontaintheanswerandevenpointto answerlessdocuments(wecannotrelyonthefirst3 answerlessdocuments(wecannotrelyonthefirst3/ /5 5 snippets/documents;~lowMRRofsnippets) snippets/documents;~lowMRRofsnippets) – – Weneedtoconsiderseveralsnippets/documentsinordertomake Weneedtoconsiderseveralsnippets/documentsinordertomakeuse use

  • fredundancy(onlycheckingthefirstisnotenoughevenifit
  • fredundancy(onlycheckingthefirstisnotenoughevenifitcontains

contains theanswer) theanswer) – – Snippetsareusuallynotverymuchlinguisticallywell Snippetsareusuallynotverymuchlinguisticallywell/ /formed formed – – Theyarecomputedonlinewithveryfast,butcheapmethods Theyarecomputedonlinewithveryfast,butcheapmethods

slide-6
SLIDE 6

Whereisthetal conferenceseriestakingplacein2006?

slide-7
SLIDE 7
slide-8
SLIDE 8

Data Data/ /driven driven Noinitialqueryexpansion/ Noinitialqueryexpansion/ refinementwithout refinementwithout initialdocuments initialdocuments Languageindependent Languageindependent answerprediction answerprediction / / NoNLPcomponents NoNLPcomponents / / Nolanguagemodel Nolanguagemodel Unsupervised Unsupervised data data/ /management management / / Noparametersmoothing Noparametersmoothing / / Norestrictionson Norestrictionson lengthofAP lengthofAP/ /phrases phrases / / Nofixedwindowsize Nofixedwindowsize

TechnologicalRoadmapforthiswork TechnologicalRoadmapforthiswork

Howfarcanwegowiththis?

slide-9
SLIDE 9

AnswerPredictionastextzooming:coresteps AnswerPredictionastextzooming:coresteps

N/snippets+ NL/question Document Construction: Token set W Sentence set S Ranking of sentences Extraction of AP-phrases Ranking of AP-phrases Extraction of exact answers

answers

ShallowAnswerextraction: ShallowAnswerextraction: Onlyrelevantsofarforperforming Onlyrelevantsofarforperforming evaluationofanswerprediction evaluationofanswerprediction

slide-10
SLIDE 10

DocumentConstruction/Representation DocumentConstruction/Representation

  • Sentencesfromsnippets

Sentencesfromsnippets &NLquestion &NLquestion

– – Onlyverysimple/local Onlyverysimple/local tokenizer tokenizer &sentence &sentence markers markers

  • Collectglobalstatistics

Collectglobalstatistics

– – Wordpair Wordpair/ /distance distance frequency(respecting frequency(respecting

  • rder)
  • rder)

   = . . 1 Otherwise k position at S in is w word the if X

s i sik

∑ ∑

= + = −

=

σ ε ε

ε ω ω

1 ) ( 1 ) (

) , , (

s S len k sjk k si j i

s

X X freq

  • A

Adocument document Disrepresentedby Disrepresentedby thefollowingelements: thefollowingelements:

} , ) , , ( , , , , ) , , ( , , , { Υ ≤ ≤ > ∀ > < = ε ε ω ω ε ε ω ω ε ω ω

j i j i j i

freq j i freq D

Allpossibleunits| Allpossibleunits|W|x|W|xY W|x|W|xY, ,|W|

|W| relativesmall(N relativesmall(N/ /snippets) snippets)

Length ofthe longest Sentence inD

slide-11
SLIDE 11

RankingofSentences RankingofSentences

  • A

Amatrix matrix for for each each sentence sentence: :

     > < =

  • therwise

i j if freq j i if freq S M

i j j i s ij

) , , ( ) , , ( ) ( ε ω ω ε ω ω

  • Filtering:ignorelow

Filtering:ignorelow frequentelements( frequentelements(ζ ζ=2 =2 globally). globally).

, = ⇒ ≤ ∀

ij ij

M M j i ζ

  • rank(Ss

rank(Ss) ) is is given given by by the the maximal maximaleigenvalue eigenvalue ofM:

  • fM:

)) ( ( ) (

max s

S M Ss rank λ =

Thiseigenvalue givestheamountof„syntacticbonding force“ capturedbytheeigenvectorrelatedtoλmax.

Toavoid bias oflongsequences

  • flow correlated words
slide-12
SLIDE 12

Ranking RankingSentences Sentences / / Remarks Remarks

  • Retrievalofsnippetsisbiasedbytermsinthequery

Retrievalofsnippetsisbiasedbytermsinthequery

  • Snippets notonly consists ofquery terms, they also

Snippets notonly consists ofquery terms, they also consistsofenrichedcontextualinformation consistsofenrichedcontextualinformation

  • Our ranking schema identifies

Our ranking schema identifies strong syntactic strong syntactic patterns patterns insnippetsusingthemtorankthesentences insnippetsusingthemtorankthesentences inthesnippets inthesnippets

  • Ahighrankedsentence

Ahighrankedsentencenot not necessarycontainsquery necessarycontainsquery terms,butmightcontaintheanswer terms,butmightcontaintheanswer

  • Whatisthedifferencebetweenthisapproachanda

Whatisthedifferencebetweenthisapproachanda rankingbasedon rankingbasedonn n/ /grams grams (e.g., (e.g.,AskMRS AskMRS) )

– – Wedonothaveanydependencyon Wedonothaveanydependencyonlengths lengths – – Wedonotneedtoestimate Wedonotneedtoestimateback back/ /off

  • ff probabilities

probabilities – – Wedonothavetheproblemthat Wedonothavetheproblemthatlongsentences longsentences willtendto willtendto havea havealowerrank lowerrank than thansmallsentences smallsentences

slide-13
SLIDE 13

DeterminationofAP DeterminationofAP/ /phrases phrases

  • Idea:Sequencesofpairsofwordswhichoccurwitha

Idea:Sequencesofpairsofwordswhichoccurwitha highfrequencyinM(i.e.,inasentence)are highfrequencyinM(i.e.,inasentence)arechainsof chainsof relatedwords relatedwords,thatis,our ,thatis,ourAP AP/ /phrases phrases. .

  • Wordsthatdo

Wordsthatdonot not haveastrongrelationwithanyother haveastrongrelationwithanyother word in S word in Ss

s are replaced with a

are replaced with a „ „* *“ “ / /> defines cutting > defines cutting pointsforsentences pointsforsentences

  • Example:

Example:

– – „ „ThepresidentofFrancewentonHolidaysyesterday ThepresidentofFrancewentonHolidaysyesterday“ “ – – „ „ThepresidentofFrance**onHolidays* ThepresidentofFrance**onHolidays*“ “ – – „ „ThepresidentofFrance ThepresidentofFrance“ “, ,„ „onHolidays

  • nHolidays“

slide-14
SLIDE 14

RankingofAP RankingofAP/ /phrases phrases

  • ForeachAP

ForeachAP/ /phrasewecombineitsbi phrasewecombineitsbi/ /gramstatistics(globalcontext gramstatistics(globalcontext viasnippets)withtherankofitsembeddingsentence(localcon viasnippets)withtherankofitsembeddingsentence(localcontext) text)

= −

=

β

υ

2 1)

| ( * ) ( ) (

b b b s

B B P S rank rank

)) ( log( )) , 1 ( log( ) | (

1 1 1 − − −

− =

b b b b b

B freq B B freq B B P

Logreducesthetrendtofavorhighfrequentwords.

Note: Note: anAP anAP/ /phrasecanbemappedtodifferentrankvalues(ifitisextracte phrasecanbemappedtodifferentrankvalues(ifitisextractedfrom dfrom differentsentences) differentsentences)/ />keeponlythehighestrankedone. >keeponlythehighestrankedone.

slide-15
SLIDE 15

HowtoMeasuretheQualityofanAP HowtoMeasuretheQualityofanAP/ /phrase? phrase?

  • Remember:

Remember: AnAP AnAP/ /phraseiseitheraparaphrasetoasubstring phraseiseitheraparaphrasetoasubstring

  • fthequeryoranexactanswerstring.
  • fthequeryoranexactanswerstring.

– – BasicallynoNLPcomponents BasicallynoNLPcomponents – – Daten Daten/ /driven,languageindependent driven,languageindependent

  • Weassumethat:

Weassumethat: The Thedistribution distribution ofanswersintherankinggivesa

  • fanswersintherankinggivesa

notionofthepotentialqualityoftheanswer notionofthepotentialqualityoftheanswer predictionstrategy predictionstrategy

– – UsetherankedAP UsetherankedAP/ /phrasesforextractingexactanswers phrasesforextractingexactanswers – – UsesimpleanswerextractorssimulatingastandardQA UsesimpleanswerextractorssimulatingastandardQA

slide-16
SLIDE 16

Shallow Shallow Answer Answer Extraction Extraction

  • First

FirstStep Step, ,determine determine EATjust EATjustbe be looking looking up upWh Wh/ /forms forms Wer, Wer,Who Who, ,Qui Quié én n,Quem ,Quem Person Person Wo, Wo,Where Where, ,D Dó ónde nde, ,Onde Onde Location Location Wann, Wann,When When, ,Cu Cuá ándo ndo, ,Qu Qué é ano ano, , WelchemJahr, WelchemJahr,Que Que ano ano Date Date Keywords Keywords EAT EAT

  • Second

Secondstep step, ,extract extract terms terms as asexact exact answer answer candidates candidates, , basically basically – – *forquerytermsandccw *forquerytermsandccw – – *fornumericcharacters(who)/non *fornumericcharacters(who)/non/ /numeric(when) numeric(when) – – Who/when:Answersaretermsseparatedby* Who/when:Answersaretermsseparatedby* – – Where:termsthatmatchalocationnameinWordnet Where:termsthatmatchalocationnameinWordnet (viaBabelfish) (viaBabelfish)

slide-17
SLIDE 17

Experiments Experiments

  • CLEF2004

CLEF2004corpus corpus

– – QA QApairs pairs from from 1994/95 1994/95newspaper newspaper texts texts – – When When/ /Who Who/ /Where Where questions questions for for 4 4languages languages

  • N=

N=30 30 Snippets Snippets and andGoogle Google/ /API API

  • Two

Two types types of

  • fanswers

answers: :

– –

  • :

:

  • Exact

Exact matching matching with with the the answer answer provided provided by by CLEF. CLEF.

– –

  • Are

Arenot not exact exact answers answers, ,but but they they are are very very close close answers answers: :

– –

  • :

:not not only

  • nly city

city name name, ,country country name name is is also alsocorrect correct. . – –

  • :

:variants variants like like „ „G.Bush G.Bush“ “, ,„ „GeorgeW.Bush GeorgeW.Bush“ “. . – –

  • :

:„ „61945 61945“ “, ,“ “1945 1945“ “. .

  • Inexact

Inexact answers answers are are important important, ,because because we we aim aim for for assessing assessing the the quality quality of

  • fpredicted

predicted answers answers. .

slide-18
SLIDE 18

Experiments Experiments

slide-19
SLIDE 19

Experiments Experiments

slide-20
SLIDE 20

Experiments Experiments/ / Discussion Discussion

  • The

The distribution distribution ofanswersgivesthenotionofthequalityofthe

  • fanswersgivesthenotionofthequalityofthe

rankingstrategy rankingstrategy

  • Our results do not behave in the same way for all

Our results do not behave in the same way for all kinds of kinds of questions questions and andlanguages languages: : – – Questiontypes Questiontypes – – The Theshallow shallow natureoftheanswerextractors natureoftheanswerextractors – – The Theredundancy redundancy ontheWeb

  • ntheWeb

– – Differentnumbersofmentioningaterm Differentnumbersofmentioningaterm

  • Lita&Carbonell:2004reporta

Lita&Carbonell:2004reportaMRR=0.447 MRR=0.447 for296English for296Englishtemporal temporal questionsforexactanswermatching questionsforexactanswermatching

  • Weconcludethatourapproachisatleastcompetitive

Weconcludethatourapproachisatleastcompetitive

  • ExperimentalQAwebsystemisonline

ExperimentalQAwebsystemisonline – – About5 About5/ /8seconds/QAcycle 8seconds/QAcycle

slide-21
SLIDE 21

FutureWork FutureWork

  • Queryrefinementandbootstrapping

Queryrefinementandbootstrapping

– – Exploringuserfeedback Exploringuserfeedback

  • Daten

Daten/ /drivenapproachforanswerextraction drivenapproachforanswerextraction

– – “ “ExploringGeneticAlgorithms ExploringGeneticAlgorithms” ”,MasterThesisby ,MasterThesisby Figueroasubmitted Figueroasubmitted

  • Exploremethodofanswerpredictionforother

Exploremethodofanswerpredictionforother applications,e.g.,clusteringofsequencesand applications,e.g.,clusteringofsequencesand recognitionofparaphrases recognitionofparaphrases

slide-22
SLIDE 22

TheEnd! TheEnd! Thankyouforyourattention Thankyouforyourattention