Question-Answering: Evaluation, Systems, Resources Ling573 NLP - - PowerPoint PPT Presentation

question answering evaluation systems resources
SMART_READER_LITE
LIVE PREVIEW

Question-Answering: Evaluation, Systems, Resources Ling573 NLP - - PowerPoint PPT Presentation

Question-Answering: Evaluation, Systems, Resources Ling573 NLP Systems & Applications April 5, 2011 Roadmap Rounding dimensions of QA Evaluation, TREC QA systems: Alternate Approaches ISIs Webclopedia


slide-1
SLIDE 1

Question-Answering: Evaluation, Systems, Resources

Ling573 NLP Systems & Applications April 5, 2011

slide-2
SLIDE 2

Roadmap

— Rounding dimensions of QA

— Evaluation, TREC

— QA systems: Alternate Approaches

— ISI’s Webclopedia — LCC’s PowerAnswer-2 and Palantir — Insight’s Patterns

— Resources

slide-3
SLIDE 3

Evaluation

— Candidate criteria:

— Relevance — Correctness

slide-4
SLIDE 4

Evaluation

— Candidate criteria:

— Relevance — Correctness — Conciseness:

— No extra information

slide-5
SLIDE 5

Evaluation

— Candidate criteria:

— Relevance — Correctness — Conciseness:

— No extra information

— Completeness:

— Penalize partial answers

slide-6
SLIDE 6

Evaluation

— Candidate criteria:

— Relevance — Correctness — Conciseness:

— No extra information

— Completeness:

— Penalize partial answers

— Coherence:

— Easily readable

slide-7
SLIDE 7

Evaluation

— Candidate criteria:

— Relevance — Correctness — Conciseness:

— No extra information

— Completeness:

— Penalize partial answers

— Coherence:

— Easily readable

— Justification

slide-8
SLIDE 8

Evaluation

— Candidate criteria:

— Relevance — Correctness — Conciseness:

— No extra information

— Completeness:

— Penalize partial answers

— Coherence:

— Easily readable

— Justification

— Tension among criteria

slide-9
SLIDE 9

Evaluation

— Consistency/repeatability:

— Are answers scored reliability

slide-10
SLIDE 10

Evaluation

— Consistency/repeatability:

— Are answers scored reliability?

— Automation:

— Can answers be scored automatically? — Required for machine learning tune/test

slide-11
SLIDE 11

Evaluation

— Consistency/repeatability:

— Are answers scored reliability?

— Automation:

— Can answers be scored automatically? — Required for machine learning tune/test

— Short answer answer keys

— Litkowski’s patterns

slide-12
SLIDE 12

Evaluation

— Classical:

— Return ranked list of answer candidates

slide-13
SLIDE 13

Evaluation

— Classical:

— Return ranked list of answer candidates — Idea: Correct answer higher in list => higher score — Measure: Mean Reciprocal Rank (MRR)

slide-14
SLIDE 14

Evaluation

— Classical:

— Return ranked list of answer candidates — Idea: Correct answer higher in list => higher score — Measure: Mean Reciprocal Rank (MRR)

— For each question,

— Get reciprocal of rank of first correct answer — E.g. correct answer is 4 => ¼ — None correct => 0

— Average over all questions

MRR = 1 ranki

i=1 N

!

N

slide-15
SLIDE 15

Dimensions of TREC QA

— Applications

slide-16
SLIDE 16

Dimensions of TREC QA

— Applications

— Open-domain free text search — Fixed collections — News, blogs

slide-17
SLIDE 17

Dimensions of TREC QA

— Applications

— Open-domain free text search — Fixed collections — News, blogs

— Users

— Novice

— Question types

slide-18
SLIDE 18

Dimensions of TREC QA

— Applications

— Open-domain free text search — Fixed collections — News, blogs

— Users

— Novice

— Question types

— Factoid -> List, relation, etc

— Answer types

slide-19
SLIDE 19

Dimensions of TREC QA

— Applications

— Open-domain free text search — Fixed collections — News, blogs

— Users

— Novice

— Question types

— Factoid -> List, relation, etc

— Answer types

— Predominantly extractive, short answer in context

— Evaluation:

slide-20
SLIDE 20

Dimensions of TREC QA

— Applications

— Open-domain free text search — Fixed collections — News, blogs

— Users

— Novice

— Question types

— Factoid -> List, relation, etc

— Answer types

— Predominantly extractive, short answer in context

— Evaluation:

— Official: human; proxy: patterns

— Presentation: One interactive track

slide-21
SLIDE 21

Webclopedia

— Webclopedia system:

— Information Sciences Institute (ISI), USC — Factoid QA: brief phrasal factual answers

slide-22
SLIDE 22

Webclopedia

— Webclopedia system:

— Information Sciences Institute (ISI), USC — Factoid QA: brief phrasal factual answers

— Prior approaches:

— Form query, retrieve passage, slide window over passages

— Pick window with highest score

slide-23
SLIDE 23

Webclopedia

— Webclopedia system:

— Information Sciences Institute (ISI), USC — Factoid QA: brief phrasal factual answers

— Prior approaches:

— Form query, retrieve passage, slide window over passages

— Pick window with highest score — E.g. # desirable words: overlap with query content terms

— Issues:

slide-24
SLIDE 24

Webclopedia

— Webclopedia system:

— Information Sciences Institute (ISI), USC — Factoid QA: brief phrasal factual answers

— Prior approaches:

— Form query, retrieve passage, slide window over passages

— Pick window with highest score — E.g. # desirable words: overlap with query content terms

— Issues:

— Imprecise boundaries

slide-25
SLIDE 25

Webclopedia

— Webclopedia system:

— Information Sciences Institute (ISI), USC — Factoid QA: brief phrasal factual answers

— Prior approaches:

— Form query, retrieve passage, slide window over passages

— Pick window with highest score — E.g. # desirable words: overlap with query content terms

— Issues:

— Imprecise boundaries: window vs NP/Name — Word overlap-based

slide-26
SLIDE 26

Webclopedia

— Webclopedia system:

— Information Sciences Institute (ISI), USC — Factoid QA: brief phrasal factual answers

— Prior approaches:

— Form query, retrieve passage, slide window over passages

— Pick window with highest score — E.g. # desirable words: overlap with query content terms

— Issues:

— Imprecise boundaries: window vs NP/Name — Word overlap-based: synonyms? — Single window:

slide-27
SLIDE 27

Webclopedia

— Webclopedia system:

— Information Sciences Institute (ISI), USC — Factoid QA: brief phrasal factual answers

— Prior approaches:

— Form query, retrieve passage, slide window over passages

— Pick window with highest score — E.g. # desirable words: overlap with query content terms

— Issues:

— Imprecise boundaries: window vs NP/Name — Word overlap-based: synonyms? — Single window: discontinuous answers?

slide-28
SLIDE 28

Webclopedia Improvements

— Syntactic-semantic question analysis

slide-29
SLIDE 29

Webclopedia Improvements

— Syntactic-semantic question analysis — QA pattern matching

slide-30
SLIDE 30

Webclopedia Improvements

— Syntactic-semantic question analysis — QA pattern matching — Classify QA types to improve answer type ID — Use robust syntactic-semantic parser for analysis — Combine word-, syntactic info for answer selection

slide-31
SLIDE 31

Webclopedia Architecture

— Query parsing — Query formulation — IR — Segmentation — Segment ranking — Segment parsing — Answering pinpointing & ranking

slide-32
SLIDE 32

Webclopedia QA Typology

— Issue: Many ways to express same info need

slide-33
SLIDE 33

Webclopedia QA Typology

— Issue: Many ways to express same info need

— What is the age of the Queen of Holland? How old is

the Netherlands’ Queen?, …

slide-34
SLIDE 34

Webclopedia QA Typology

— Issue: Many ways to express same info need

— What is the age of the Queen of Holland? How old is

the Netherlands’ Queen?, …

— Analyzed 17K+ answers.com questions -> 79 nodes

— Nodes include:

— Question & answer examples:

— Q: Who was Johnny Mathis' high school track coach? — A: Lou Vasquez, track coach of…and Johnny Mathis

slide-35
SLIDE 35

Webclopedia QA Typology

— Issue: Many ways to express same info need

— What is the age of the Queen of Holland? How old is

the Netherlands’ Queen?, …

— Analyzed 17K+ answers.com questions -> 79 nodes

— Nodes include:

— Question & answer examples:

— Q: Who was Johnny Mathis' high school track coach? — A: Lou Vasquez, track coach of…and Johnny Mathis

— Question & answer templates

— Q: who be <entity>'s <role>, who be <role> of <entity> — A: <person>, <role> of <entity>

slide-36
SLIDE 36

Webclopedia QA Typology

— Issue: Many ways to express same info need

— What is the age of the Queen of Holland? How old is

the Netherlands’ Queen?, …

— Analyzed 17K+ answers.com questions -> 79 nodes

— Nodes include:

— Question & answer examples:

— Q: Who was Johnny Mathis' high school track coach? — A: Lou Vasquez, track coach of…and Johnny Mathis

— Question & answer templates

— Q: who be <entity>'s <role>, who be <role> of <entity> — A: <person>, <role> of <entity>

— Qtarget: semantic type of answer

slide-37
SLIDE 37

Webclopedia QA Typology

slide-38
SLIDE 38

Question & Answer Parsing

— CONTEX parser:

— Trained on growing collection of questions

slide-39
SLIDE 39

Question & Answer Parsing

— CONTEX parser:

— Trained on growing collection of questions

— Original version parsed questions badly

slide-40
SLIDE 40

Question & Answer Parsing

— CONTEX parser:

— Trained on growing collection of questions

— Original version parsed questions badly

— Also identifies Qtargets and Qargs:

— Qtargets:

slide-41
SLIDE 41

Question & Answer Parsing

— CONTEX parser:

— Trained on growing collection of questions

— Original version parsed questions badly

— Also identifies Qtargets and Qargs:

— Qtargets:

— Parts of speech — Semantic roles in parse tree — Elements of Typology + additional info

slide-42
SLIDE 42

Question & Answer Parsing

— CONTEX parser:

— Trained on growing collection of questions

— Original version parsed questions badly

— Also identifies Qtargets and Qargs:

— Qtargets:

— Parts of speech — Semantic roles in parse tree — Elements of Typology + additional info

E.g. Who is Betsy Ross?

— Qtarget: WHY-FAMOUS-PERSON; Qargs: “Betsy Ross”

slide-43
SLIDE 43

Question & Answer Parsing

— CONTEX parser:

— Trained on growing collection of questions

— Original version parsed questions badly

— Also identifies Qtargets and Qargs:

— Qtargets:

— Parts of speech — Semantic roles in parse tree — Elements of Typology + additional info

E.g. Who is Betsy Ross?

— Qtarget: WHY-FAMOUS-PERSON; Qargs: “Betsy Ross”

— Extracted based on 276 hand-written rules

— 10%: no target

slide-44
SLIDE 44

Answer Matching

— Matches:

— QA patterns in parse tree — Qtarget and Qwords in parse tree — Words in window

slide-45
SLIDE 45

Enhancing Word-based Match

— Qtarget-specific knowledge: Narrow

slide-46
SLIDE 46

Enhancing Word-based Match

— Qtarget-specific knowledge: Narrow

— Quantities: e.g. population

— Q: What is the population of New York? – 100K+, M+

slide-47
SLIDE 47

Enhancing Word-based Match

— Qtarget-specific knowledge: Narrow

— Quantities: e.g. population

— Q: What is the population of New York? – 100K+, M+ — Biased to typical mean values

— Abbreviations / Expansions

— Q: What is NAFTA?

slide-48
SLIDE 48

Enhancing Word-based Match

— Qtarget-specific knowledge: Narrow

— Quantities: e.g. population

— Q: What is the population of New York? – 100K+, M+ — Biased to typical mean values

— Abbreviations / Expansions

— Q: What is NAFTA?

— Check that answer includes N,A,F

,T , and A

— Zip code, phone number, etc patterns/NER

slide-49
SLIDE 49

Enhancing Word-based Match

— Qtarget-specific knowledge: Narrow

— Quantities: e.g. population

— Q: What is the population of New York? – 100K+, M+ — Biased to typical mean values

— Abbreviations / Expansions

— Q: What is NAFTA?

— Check that answer includes N,A,F

,T , and A

— Zip code, phone number, etc patterns/NER

— Parse information

— Link discontinuous answer information

slide-50
SLIDE 50

QTarget:External Knowledge

— WordNet:

slide-51
SLIDE 51

QTarget:External Knowledge

— WordNet:

— Glosses provide additional world knowledge

slide-52
SLIDE 52

QTarget:External Knowledge

— WordNet:

— Glosses provide additional world knowledge

— Resolve definition questions

— Q: What is the Milky Way? — Candidate 2: the galaxy that contains the Earth — Wordnet: Milky Way—the galaxy containing the solar system

slide-53
SLIDE 53

QTarget:External Knowledge

— WordNet:

— Glosses provide additional world knowledge

— Resolve definition questions

— Q: What is the Milky Way? — Candidate 2: the galaxy that contains the Earth — Wordnet: Milky Way—the galaxy containing the solar system

— Provide implicit information

— Q1: What is the capital of the United States? —

slide-54
SLIDE 54

QTarget:External Knowledge

— WordNet:

— Glosses provide additional world knowledge

— Resolve definition questions

— Q: What is the Milky Way? — Candidate 2: the galaxy that contains the Earth — Wordnet: Milky Way—the galaxy containing the solar system

— Provide implicit information

— Q1: What is the capital of the United States? — S1. Later in the day, the president returned to Washington, the

capital of the United States

—

slide-55
SLIDE 55

QTarget:External Knowledge

— WordNet:

— Glosses provide additional world knowledge

— Resolve definition questions

— Q: What is the Milky Way? — Candidate 2: the galaxy that contains the Earth — Wordnet: Milky Way—the galaxy containing the solar system

— Provide implicit information

— Q1: What is the capital of the United States? — S1. Later in the day, the president returned to Washington, the

capital of the United States

— Wordnet: Washington—the capital of the United States

slide-56
SLIDE 56

Results

— TREC rank 2 (tied):

— Contribution: Qtarget, Word window, QA patterns — QA patterns too specific (only 4% of answers) — Qtarget classification biggest impact