[PPT] - Question-Answering: Evaluation, Systems, Resources Ling573 NLP PowerPoint Presentation

SLIDE 1

Question-Answering: Evaluation, Systems, Resources

Ling573 NLP Systems & Applications April 5, 2011

SLIDE 2

Roadmap

 Rounding dimensions of QA

 Evaluation, TREC

 QA systems: Alternate Approaches

 ISI’s Webclopedia  LCC’s PowerAnswer-2 and Palantir  Insight’s Patterns

 Resources

SLIDE 3

Evaluation

 Candidate criteria:

 Relevance  Correctness

SLIDE 4

Evaluation

 Candidate criteria:

 Relevance  Correctness  Conciseness:

 No extra information

SLIDE 5

Evaluation

 Candidate criteria:

 Relevance  Correctness  Conciseness:

 No extra information

 Completeness:

 Penalize partial answers

SLIDE 6

Evaluation

 Candidate criteria:

 Relevance  Correctness  Conciseness:

 No extra information

 Completeness:

 Penalize partial answers

 Coherence:

 Easily readable

SLIDE 7

Evaluation

 Candidate criteria:

 Relevance  Correctness  Conciseness:

 No extra information

 Completeness:

 Penalize partial answers

 Coherence:

 Easily readable

 Justification

SLIDE 8

Evaluation

 Candidate criteria:

 Relevance  Correctness  Conciseness:

 No extra information

 Completeness:

 Penalize partial answers

 Coherence:

 Easily readable

 Justification

 Tension among criteria

SLIDE 9

Evaluation

 Consistency/repeatability:

 Are answers scored reliability

SLIDE 10

Evaluation

 Consistency/repeatability:

 Are answers scored reliability?

 Automation:

 Can answers be scored automatically?  Required for machine learning tune/test

SLIDE 11

Evaluation

 Consistency/repeatability:

 Are answers scored reliability?

 Automation:

 Can answers be scored automatically?  Required for machine learning tune/test

 Short answer answer keys

 Litkowski’s patterns

SLIDE 12

Evaluation

 Classical:

 Return ranked list of answer candidates

SLIDE 13

Evaluation

 Classical:

 Return ranked list of answer candidates  Idea: Correct answer higher in list => higher score  Measure: Mean Reciprocal Rank (MRR)

SLIDE 14

Evaluation

 Classical:

 Return ranked list of answer candidates  Idea: Correct answer higher in list => higher score  Measure: Mean Reciprocal Rank (MRR)

 For each question,

 Get reciprocal of rank of first correct answer  E.g. correct answer is 4 => ¼  None correct => 0

 Average over all questions

MRR = 1 ranki

i=1 N

!

N

SLIDE 15

Dimensions of TREC QA

 Applications

SLIDE 16

Dimensions of TREC QA

 Applications

 Open-domain free text search  Fixed collections  News, blogs

SLIDE 17

Dimensions of TREC QA

 Applications

 Open-domain free text search  Fixed collections  News, blogs

 Users

 Novice

 Question types

SLIDE 18

Dimensions of TREC QA

 Applications

 Open-domain free text search  Fixed collections  News, blogs

 Users

 Novice

 Question types

 Factoid -> List, relation, etc

 Answer types

SLIDE 19

Dimensions of TREC QA

 Applications

 Open-domain free text search  Fixed collections  News, blogs

 Users

 Novice

 Question types

 Factoid -> List, relation, etc

 Answer types

 Predominantly extractive, short answer in context

 Evaluation:

SLIDE 20

Dimensions of TREC QA

 Applications

 Open-domain free text search  Fixed collections  News, blogs

 Users

 Novice

 Question types

 Factoid -> List, relation, etc

 Answer types

 Predominantly extractive, short answer in context

 Evaluation:

 Official: human; proxy: patterns

 Presentation: One interactive track

SLIDE 21

Webclopedia

 Webclopedia system:

 Information Sciences Institute (ISI), USC  Factoid QA: brief phrasal factual answers

SLIDE 22

Webclopedia

 Webclopedia system:

 Information Sciences Institute (ISI), USC  Factoid QA: brief phrasal factual answers

 Prior approaches:

 Form query, retrieve passage, slide window over passages

 Pick window with highest score

SLIDE 23

Webclopedia

 Webclopedia system:

 Information Sciences Institute (ISI), USC  Factoid QA: brief phrasal factual answers

 Prior approaches:

 Form query, retrieve passage, slide window over passages

 Pick window with highest score  E.g. # desirable words: overlap with query content terms

 Issues:

SLIDE 24

Webclopedia

 Webclopedia system:

 Information Sciences Institute (ISI), USC  Factoid QA: brief phrasal factual answers

 Prior approaches:

 Form query, retrieve passage, slide window over passages

 Pick window with highest score  E.g. # desirable words: overlap with query content terms

 Issues:

 Imprecise boundaries

SLIDE 25

Webclopedia

 Webclopedia system:

 Information Sciences Institute (ISI), USC  Factoid QA: brief phrasal factual answers

 Prior approaches:

 Form query, retrieve passage, slide window over passages

 Pick window with highest score  E.g. # desirable words: overlap with query content terms

 Issues:

 Imprecise boundaries: window vs NP/Name  Word overlap-based

SLIDE 26

Webclopedia

 Webclopedia system:

 Information Sciences Institute (ISI), USC  Factoid QA: brief phrasal factual answers

 Prior approaches:

 Form query, retrieve passage, slide window over passages

 Pick window with highest score  E.g. # desirable words: overlap with query content terms

 Issues:

 Imprecise boundaries: window vs NP/Name  Word overlap-based: synonyms?  Single window:

SLIDE 27

Webclopedia

 Webclopedia system:

 Information Sciences Institute (ISI), USC  Factoid QA: brief phrasal factual answers

 Prior approaches:

 Form query, retrieve passage, slide window over passages

 Pick window with highest score  E.g. # desirable words: overlap with query content terms

 Issues:

 Imprecise boundaries: window vs NP/Name  Word overlap-based: synonyms?  Single window: discontinuous answers?

SLIDE 28

Webclopedia Improvements

 Syntactic-semantic question analysis

SLIDE 29

Webclopedia Improvements

 Syntactic-semantic question analysis  QA pattern matching

SLIDE 30

Webclopedia Improvements

 Syntactic-semantic question analysis  QA pattern matching  Classify QA types to improve answer type ID  Use robust syntactic-semantic parser for analysis  Combine word-, syntactic info for answer selection

SLIDE 31

Webclopedia Architecture

 Query parsing  Query formulation  IR  Segmentation  Segment ranking  Segment parsing  Answering pinpointing & ranking

SLIDE 32

Webclopedia QA Typology

 Issue: Many ways to express same info need

SLIDE 33

Webclopedia QA Typology

 Issue: Many ways to express same info need

 What is the age of the Queen of Holland? How old is

the Netherlands’ Queen?, …

SLIDE 34

Webclopedia QA Typology

 Issue: Many ways to express same info need

 What is the age of the Queen of Holland? How old is

the Netherlands’ Queen?, …

 Analyzed 17K+ answers.com questions -> 79 nodes

 Nodes include:

 Question & answer examples:

 Q: Who was Johnny Mathis' high school track coach?  A: Lou Vasquez, track coach of…and Johnny Mathis

SLIDE 35

Webclopedia QA Typology

 Issue: Many ways to express same info need

 What is the age of the Queen of Holland? How old is

the Netherlands’ Queen?, …

 Analyzed 17K+ answers.com questions -> 79 nodes

 Nodes include:

 Question & answer examples:

 Q: Who was Johnny Mathis' high school track coach?  A: Lou Vasquez, track coach of…and Johnny Mathis

 Question & answer templates

 Q: who be <entity>'s <role>, who be <role> of <entity>  A: <person>, <role> of <entity>

SLIDE 36

Webclopedia QA Typology

 Issue: Many ways to express same info need

 What is the age of the Queen of Holland? How old is

the Netherlands’ Queen?, …

 Analyzed 17K+ answers.com questions -> 79 nodes

 Nodes include:

 Question & answer examples:

 Q: Who was Johnny Mathis' high school track coach?  A: Lou Vasquez, track coach of…and Johnny Mathis

 Question & answer templates

 Q: who be <entity>'s <role>, who be <role> of <entity>  A: <person>, <role> of <entity>

 Qtarget: semantic type of answer

SLIDE 37

Webclopedia QA Typology

SLIDE 38

Question & Answer Parsing

 CONTEX parser:

 Trained on growing collection of questions

SLIDE 39

Question & Answer Parsing

 CONTEX parser:

 Trained on growing collection of questions

 Original version parsed questions badly

SLIDE 40

Question & Answer Parsing

 CONTEX parser:

 Trained on growing collection of questions

 Original version parsed questions badly

 Also identifies Qtargets and Qargs:

 Qtargets:

SLIDE 41

Question & Answer Parsing

 CONTEX parser:

 Trained on growing collection of questions

 Original version parsed questions badly

 Also identifies Qtargets and Qargs:

 Qtargets:

 Parts of speech  Semantic roles in parse tree  Elements of Typology + additional info

SLIDE 42

Question & Answer Parsing

 CONTEX parser:

 Trained on growing collection of questions

 Original version parsed questions badly

 Also identifies Qtargets and Qargs:

 Qtargets:

 Parts of speech  Semantic roles in parse tree  Elements of Typology + additional info

E.g. Who is Betsy Ross?

 Qtarget: WHY-FAMOUS-PERSON; Qargs: “Betsy Ross”

SLIDE 43

Question & Answer Parsing

 CONTEX parser:

 Trained on growing collection of questions

 Original version parsed questions badly

 Also identifies Qtargets and Qargs:

 Qtargets:

 Parts of speech  Semantic roles in parse tree  Elements of Typology + additional info

E.g. Who is Betsy Ross?

 Qtarget: WHY-FAMOUS-PERSON; Qargs: “Betsy Ross”

 Extracted based on 276 hand-written rules

 10%: no target

SLIDE 44

Answer Matching

 Matches:

 QA patterns in parse tree  Qtarget and Qwords in parse tree  Words in window

SLIDE 45

Enhancing Word-based Match

 Qtarget-specific knowledge: Narrow

SLIDE 46

Enhancing Word-based Match

 Qtarget-specific knowledge: Narrow

 Quantities: e.g. population

 Q: What is the population of New York? – 100K+, M+

SLIDE 47

Enhancing Word-based Match

 Qtarget-specific knowledge: Narrow

 Quantities: e.g. population

 Q: What is the population of New York? – 100K+, M+  Biased to typical mean values

 Abbreviations / Expansions

 Q: What is NAFTA?

SLIDE 48

Enhancing Word-based Match

 Qtarget-specific knowledge: Narrow

 Quantities: e.g. population

 Q: What is the population of New York? – 100K+, M+  Biased to typical mean values

 Abbreviations / Expansions

 Q: What is NAFTA?

 Check that answer includes N,A,F

,T , and A

 Zip code, phone number, etc patterns/NER

SLIDE 49

Enhancing Word-based Match

 Qtarget-specific knowledge: Narrow

 Quantities: e.g. population

 Q: What is the population of New York? – 100K+, M+  Biased to typical mean values

 Abbreviations / Expansions

 Q: What is NAFTA?

 Check that answer includes N,A,F

,T , and A

 Zip code, phone number, etc patterns/NER

 Parse information

 Link discontinuous answer information

SLIDE 50

QTarget:External Knowledge

 WordNet:

SLIDE 51

QTarget:External Knowledge

 WordNet:

 Glosses provide additional world knowledge

SLIDE 52

QTarget:External Knowledge

 WordNet:

 Glosses provide additional world knowledge

 Resolve definition questions

 Q: What is the Milky Way?  Candidate 2: the galaxy that contains the Earth  Wordnet: Milky Way—the galaxy containing the solar system

SLIDE 53

QTarget:External Knowledge

 WordNet:

 Glosses provide additional world knowledge

 Resolve definition questions

 Q: What is the Milky Way?  Candidate 2: the galaxy that contains the Earth  Wordnet: Milky Way—the galaxy containing the solar system

 Provide implicit information

 Q1: What is the capital of the United States? 

SLIDE 54

QTarget:External Knowledge

 WordNet:

 Glosses provide additional world knowledge

 Resolve definition questions

 Q: What is the Milky Way?  Candidate 2: the galaxy that contains the Earth  Wordnet: Milky Way—the galaxy containing the solar system

 Provide implicit information

 Q1: What is the capital of the United States?  S1. Later in the day, the president returned to Washington, the

capital of the United States



SLIDE 55

QTarget:External Knowledge

 WordNet:

 Glosses provide additional world knowledge

 Resolve definition questions

 Q: What is the Milky Way?  Candidate 2: the galaxy that contains the Earth  Wordnet: Milky Way—the galaxy containing the solar system

 Provide implicit information

 Q1: What is the capital of the United States?  S1. Later in the day, the president returned to Washington, the

capital of the United States

 Wordnet: Washington—the capital of the United States

SLIDE 56

Question-Answering: Evaluation, Systems, Resources

Roadmap

 Rounding dimensions of QA

 Evaluation, TREC

 QA systems: Alternate Approaches

 ISI’s Webclopedia  LCC’s PowerAnswer-2 and Palantir  Insight’s Patterns

 Resources

Evaluation

 Candidate criteria:

 Relevance  Correctness

Evaluation

 Candidate criteria:

 Relevance  Correctness  Conciseness:

Evaluation

 Candidate criteria:

 Relevance  Correctness  Conciseness:

 Completeness:

Evaluation

 Candidate criteria:

 Relevance  Correctness  Conciseness:

 Completeness:

 Coherence:

Evaluation

 Candidate criteria:

 Relevance  Correctness  Conciseness:

 Completeness:

 Coherence:

 Justification

Evaluation

 Candidate criteria:

 Relevance  Correctness  Conciseness:

 Completeness:

 Coherence:

 Justification

 Tension among criteria

Evaluation

 Consistency/repeatability:

 Are answers scored reliability

Evaluation

 Consistency/repeatability:

 Are answers scored reliability?

 Automation:

 Can answers be scored automatically?  Required for machine learning tune/test

Evaluation

 Consistency/repeatability:

 Are answers scored reliability?

 Automation:

 Can answers be scored automatically?  Required for machine learning tune/test

Evaluation

 Classical:

 Return ranked list of answer candidates

Evaluation

 Classical:

 Return ranked list of answer candidates  Idea: Correct answer higher in list => higher score  Measure: Mean Reciprocal Rank (MRR)

Evaluation

 Classical:

 Return ranked list of answer candidates  Idea: Correct answer higher in list => higher score  Measure: Mean Reciprocal Rank (MRR)

MRR = 1 ranki

!

N

Dimensions of TREC QA

 Applications

Dimensions of TREC QA

 Applications

Dimensions of TREC QA

 Applications

 Users

 Question types

Dimensions of TREC QA

 Applications

 Users

 Question types

 Answer types

Dimensions of TREC QA

 Applications

 Users

 Question types

 Answer types

 Evaluation:

Dimensions of TREC QA

 Rounding dimensions of QA

 Evaluation, TREC

 QA systems: Alternate Approaches

 ISI’s Webclopedia  LCC’s PowerAnswer-2 and Palantir  Insight’s Patterns

 Resources

 Candidate criteria:

 Relevance  Correctness

 Candidate criteria:

 Relevance  Correctness  Conciseness:

 Candidate criteria:

 Relevance  Correctness  Conciseness:

 Completeness:

 Candidate criteria:

 Relevance  Correctness  Conciseness:

 Completeness:

 Coherence:

 Candidate criteria:

 Relevance  Correctness  Conciseness:

 Completeness:

 Coherence:

 Justification

 Candidate criteria:

 Relevance  Correctness  Conciseness:

 Completeness:

 Coherence:

 Justification

 Tension among criteria

 Consistency/repeatability:

 Are answers scored reliability

 Consistency/repeatability:

 Are answers scored reliability?

 Automation:

 Can answers be scored automatically?  Required for machine learning tune/test

 Consistency/repeatability:

 Are answers scored reliability?

 Automation:

 Can answers be scored automatically?  Required for machine learning tune/test

 Classical:

 Return ranked list of answer candidates

 Classical:

 Return ranked list of answer candidates  Idea: Correct answer higher in list => higher score  Measure: Mean Reciprocal Rank (MRR)

 Classical:

 Return ranked list of answer candidates  Idea: Correct answer higher in list => higher score  Measure: Mean Reciprocal Rank (MRR)

 Applications

 Applications

 Applications

 Users

 Question types

 Applications

 Users

 Question types

 Answer types

 Applications

 Users

 Question types

 Answer types

 Evaluation:

 Webclopedia system:

 Information Sciences Institute (ISI), USC  Factoid QA: brief phrasal factual answers

 Webclopedia system:

 Information Sciences Institute (ISI), USC  Factoid QA: brief phrasal factual answers

 Prior approaches:

 Webclopedia system:

 Information Sciences Institute (ISI), USC  Factoid QA: brief phrasal factual answers

 Prior approaches:

 Issues:

 Webclopedia system:

 Information Sciences Institute (ISI), USC  Factoid QA: brief phrasal factual answers

 Prior approaches:

 Issues:

 Imprecise boundaries

 Webclopedia system:

 Information Sciences Institute (ISI), USC  Factoid QA: brief phrasal factual answers

 Prior approaches:

 Issues:

 Imprecise boundaries: window vs NP/Name  Word overlap-based

 Webclopedia system:

 Information Sciences Institute (ISI), USC  Factoid QA: brief phrasal factual answers

 Prior approaches:

 Issues: