Evaluation Metrics for Machine Reading Comprehension (RC): - - PowerPoint PPT Presentation

evaluation metrics for machine
SMART_READER_LITE
LIVE PREVIEW

Evaluation Metrics for Machine Reading Comprehension (RC): - - PowerPoint PPT Presentation

Evaluation Metrics for Machine Reading Comprehension (RC): Prerequisite Skills and Readability Sugawara et al. The University of Tokyo, Fujitsu Laboratories Ltd., Natural Institute of Informatics Presented by: Shaima AbdulMajeed To give the


slide-1
SLIDE 1

Evaluation Metrics for Machine Reading Comprehension (RC): Prerequisite Skills and Readability

Sugawara et al. The University of Tokyo, Fujitsu Laboratories Ltd., Natural Institute of Informatics Presented by: Shaima AbdulMajeed

slide-2
SLIDE 2

RC Task

To give the agent the ability to:

  • 1. Read open-domain documents
  • 2. Answer questions about them
slide-3
SLIDE 3

Goal

Knowing Quality of Reading Comprehension (RC) datasets

slide-4
SLIDE 4

Why

To know Which dataset to use that best evaluates the developed RC system

slide-5
SLIDE 5

RC dataset Example

slide-6
SLIDE 6

Datasets evaluated

slide-7
SLIDE 7

Current dataset metrics

  • Question types
  • Answer types
  • Categories
slide-8
SLIDE 8

Is that enough ?

slide-9
SLIDE 9

Does readability of text correlates with difficulty of answering questions about it?

slide-10
SLIDE 10

Evaluation Metrics Proposed

1.Prerequisite skills 2.Readability metrics

slide-11
SLIDE 11

Prerequisite skills

  • 1. Object Tracking
  • 2. Mathematical Reasoning
  • 3. Coreference resolution
  • 4. Logical Reasoning
  • 5. Analogy
  • 6. Causal relation
  • 7. Spatiotemporal relation
  • 8. Ellipsis
  • 9. Bridging
  • 10. Elaboration
  • 11. Meta-Knowledge
  • 12. Schematic clause relation
  • 13. Punctuation
slide-12
SLIDE 12

Tracking or grasping of multiple objects Context: Tom ate apples. Mary ate apples, too. Q: Who ate apples? A: Tom and Mary (Object: Tom, Mary)

  • 1. Object Tracking
  • 2. Mathematical Reasoning
  • 3. Coreference resolution
  • 4. Logical Reasoning
  • 5. Analogy
  • 6. Causal relation
  • 7. Spatiotemporal relation
  • 8. Ellipsis
  • 9. Bridging
  • 10. Elaboration
  • 11. Meta-Knowledge
  • 12. Schematic clause relation
  • 13. Punctuation
slide-13
SLIDE 13

Statistical, mathematical and quantitative reasoning

Context: Tom ate ten apples. Mary ate eight apples. Q: How many apples did Tom and Mary eat? A: eighteen

  • 1. Object Tracking
  • 2. Mathematical Reasoning
  • 3. Coreference resolution
  • 4. Logical Reasoning
  • 5. Analogy
  • 6. Causal relation
  • 7. Spatiotemporal relation
  • 8. Ellipsis
  • 9. Bridging
  • 10. Elaboration
  • 11. Meta-Knowledge
  • 12. Schematic clause relation
  • 13. Punctuation
slide-14
SLIDE 14

Detection and resolution of all possible demonstratives Context: Tom was hungry. He ate ten apples. Q: How many apples did Tom eat? A: ten (Tom = He)

  • 1. Object Tracking
  • 2. Mathematical Reasoning
  • 3. Coreference resolution
  • 4. Logical Reasoning
  • 5. Analogy
  • 6. Causal relation
  • 7. Spatiotemporal relation
  • 8. Ellipsis
  • 9. Bridging
  • 10. Elaboration
  • 11. Meta-Knowledge
  • 12. Schematic clause relation
  • 13. Punctuation
slide-15
SLIDE 15

Understanding of Predicate Logic Context: All students have a

  • pen. Tom is a student.

Q: Does Tom have a pen. A: Yes (and object tracking)

  • 1. Object Tracking
  • 2. Mathematical Reasoning
  • 3. Coreference resolution
  • 4. Logical Reasoning
  • 5. Analogy
  • 6. Causal relation
  • 7. Spatiotemporal relation
  • 8. Ellipsis
  • 9. Bridging
  • 10. Elaboration
  • 11. Meta-Knowledge
  • 12. Schematic clause relation
  • 13. Punctuation
slide-16
SLIDE 16

Understanding metaphors Context: The White House said Trump is open to … Q: Did the President of the United States and his staff say Trump is open to ... A: Yes (The White House said = POTUS and his staff said...)

  • 1. Object Tracking
  • 2. Mathematical Reasoning
  • 3. Coreference resolution
  • 4. Logical Reasoning
  • 5. Analogy
  • 6. Causal relation
  • 7. Spatiotemporal relation
  • 8. Ellipsis
  • 9. Bridging
  • 10. Elaboration
  • 11. Meta-Knowledge
  • 12. Schematic clause relation
  • 13. Punctuation
slide-17
SLIDE 17

“why,” “because,”

  • 1. Object Tracking
  • 2. Mathematical Reasoning
  • 3. Coreference resolution
  • 4. Logical Reasoning
  • 5. Analogy
  • 6. Causal relation
  • 7. Spatiotemporal relation
  • 8. Ellipsis
  • 9. Bridging
  • 10. Elaboration
  • 11. Meta-Knowledge
  • 12. Schematic clause relation
  • 13. Punctuation
slide-18
SLIDE 18

Context: One day, Tom went to the park. After that, he went to the restaurant. Finally, he went to his grandma's house. Q: Where did Tom go finally? A: his grandma's house (Finally: temporal)

  • 1. Object Tracking
  • 2. Mathematical Reasoning
  • 3. Coreference resolution
  • 4. Logical Reasoning
  • 5. Analogy
  • 6. Causal relation
  • 7. Spatiotemporal relation
  • 8. Ellipsis
  • 9. Bridging
  • 10. Elaboration
  • 11. Meta-Knowledge
  • 12. Schematic clause relation
  • 13. Punctuation
slide-19
SLIDE 19

Recognizing implicit information

She is a smart student = She is a student

  • 1. Object Tracking
  • 2. Mathematical Reasoning
  • 3. Coreference resolution
  • 4. Logical Reasoning
  • 5. Analogy
  • 6. Causal relation
  • 7. Spatiotemporal relation
  • 8. Ellipsis
  • 9. Bridging
  • 10. Elaboration
  • 11. Meta-Knowledge
  • 12. Schematic clause relation
  • 13. Punctuation
slide-20
SLIDE 20

Inference supported by grammatical and lexical knowledge She loves sushi = She likes sushi

  • 1. Object Tracking
  • 2. Mathematical Reasoning
  • 3. Coreference resolution
  • 4. Logical Reasoning
  • 5. Analogy
  • 6. Causal relation
  • 7. Spatiotemporal relation
  • 8. Ellipsis
  • 9. Bridging
  • 10. Elaboration
  • 11. Meta-Knowledge
  • 12. Schematic clause relation
  • 13. Punctuation
slide-21
SLIDE 21

Inference using known facts, general knowledge The writer of Hamlet was Shakespeare

  • Shakespeare wrote

Hamlet

  • 1. Object Tracking
  • 2. Mathematical Reasoning
  • 3. Coreference resolution
  • 4. Logical Reasoning
  • 5. Analogy
  • 6. Causal relation
  • 7. Spatiotemporal relation
  • 8. Ellipsis
  • 9. Bridging
  • 10. Elaboration
  • 11. Meta-Knowledge
  • 12. Schematic clause relation
  • 13. Punctuation
slide-22
SLIDE 22

Who are the principal characters of the story? What is the main subject of this article?

  • 1. Object Tracking
  • 2. Mathematical Reasoning
  • 3. Coreference resolution
  • 4. Logical Reasoning
  • 5. Analogy
  • 6. Causal relation
  • 7. Spatiotemporal relation
  • 8. Ellipsis
  • 9. Bridging
  • 10. Elaboration
  • 11. Meta-Knowledge
  • 12. Schematic clause relation
  • 13. Punctuation
slide-23
SLIDE 23

Understanding of complex sentences that have coordination or subordination Context: Tom has a friend whose name is John. Q: What is a name of Tom's friend? A: John (whose = relative clause)

  • 1. Object Tracking
  • 2. Mathematical Reasoning
  • 3. Coreference resolution
  • 4. Logical Reasoning
  • 5. Analogy
  • 6. Causal relation
  • 7. Spatiotemporal relation
  • 8. Ellipsis
  • 9. Bridging
  • 10. Elaboration
  • 11. Meta-Knowledge
  • 12. Schematic clause relation
  • 13. Punctuation
slide-24
SLIDE 24

Understanding of punctuation marks Context: The AFC champion (Denver Broncos) defeated the NFC champion (Carolina Panthers) in super bowl 50 Q: Which NFL team won Super Bowl 50? A: Denver Broncos Note: parentheses present the champion team's name

  • 1. Object Tracking
  • 2. Mathematical Reasoning
  • 3. Coreference resolution
  • 4. Logical Reasoning
  • 5. Analogy
  • 6. Causal relation
  • 7. Spatiotemporal relation
  • 8. Ellipsis
  • 9. Bridging
  • 10. Elaboration
  • 11. Meta-Knowledge
  • 12. Schematic clause relation
  • 13. Punctuation
slide-25
SLIDE 25

Readability metrics

1.Lexical Features 2.Syntactic Features 3.Traditional Features

slide-26
SLIDE 26

Annotation Procedure (100 Qu)

Step 1: annotators see simultaneously the context, question, and its answer e.g. Q: Why Tom looked angry? A: His sister ate his cake. Step 2: Select sentences (from the context) e.g. Context: (C1) Tom is a student. (C2) Tom looks annoyed because his sister ate his cake. (C3) His sister's name is Sylvia.

  • > Select: C2

Step 3: Select skills required for answering the question e.g.: C2: Tom looks annoyed because his sister ate his cake. Skill: causal relation ("because"), bridging (lexical knowledge of "annoyed = angry")

slide-27
SLIDE 27

Results

1.Prerequisite skills required for each RC

dataset

2.Prerequisite skills required per question 3.Readability of each RC dataset 4.Correlation between readability and

prerequisite skills required.

slide-28
SLIDE 28

Results 1- prerequisite skills required for each RC dataset

  • 1. QA4MRE (Highest score in all skills):
  • Bridging
  • Elaboration
  • Clause Relation
  • Punctuation
  • 2. MCTest
  • Casual Relation
  • Meta Knowledge
  • Coreference resolution
  • Spatiotemoral Relation
slide-29
SLIDE 29

Results 2- Number of prerequisite skills required per question

QA4MRE MCTests SQuAD WDW MS MARCO News QA

Avg 3.25 1.56 1.28 2.43 1.19 1.99 Highest – technical documents – Qu handcrafted by experts

slide-30
SLIDE 30

Results Nonsense/Difficult Questions

QA4MRE MCTest SQuAD WDW MARCO News QA Non sense

10 1 3 27 14 1

slide-31
SLIDE 31

Results 3- Readability metrics for each RC dataset

QA4MRE MCTests SQuAD WDW MARCO News QA

F-K 14.9 3.6 14.6 15.3 12.1 12.6

slide-32
SLIDE 32

Results 4- Correlation between readability metrics and the number of required prerequisite skills

slide-33
SLIDE 33

Results 4- Correlation between readability metrics and the number of required prerequisite skills

slide-34
SLIDE 34

Summary

QA4MRE

  • Hard to read
  • Hard to answer

MCTest

  • Easy to read
  • Hard to answer

SQuAD

  • Hard to read
  • Easy to answer
slide-35
SLIDE 35

How to utilize this study

  • 1. Preparing appropriate datasets for each step of RC dev:

I.

easy-to-read and easy-to-answer

II.

easy-to-read but difficult-to-answer dataset

  • III. difficult-to-read and difficult-to-answer datasets
  • 2. Apply metrics to evaluate other datasets
slide-36
SLIDE 36