Evaluation Metrics for Machine Reading Comprehension (RC): Prerequisite Skills and Readability
Sugawara et al. The University of Tokyo, Fujitsu Laboratories Ltd., Natural Institute of Informatics Presented by: Shaima AbdulMajeed
Evaluation Metrics for Machine Reading Comprehension (RC): - - PowerPoint PPT Presentation
Evaluation Metrics for Machine Reading Comprehension (RC): Prerequisite Skills and Readability Sugawara et al. The University of Tokyo, Fujitsu Laboratories Ltd., Natural Institute of Informatics Presented by: Shaima AbdulMajeed To give the
Sugawara et al. The University of Tokyo, Fujitsu Laboratories Ltd., Natural Institute of Informatics Presented by: Shaima AbdulMajeed
To give the agent the ability to:
Knowing Quality of Reading Comprehension (RC) datasets
To know Which dataset to use that best evaluates the developed RC system
Understanding of punctuation marks Context: The AFC champion (Denver Broncos) defeated the NFC champion (Carolina Panthers) in super bowl 50 Q: Which NFL team won Super Bowl 50? A: Denver Broncos Note: parentheses present the champion team's name
Step 1: annotators see simultaneously the context, question, and its answer e.g. Q: Why Tom looked angry? A: His sister ate his cake. Step 2: Select sentences (from the context) e.g. Context: (C1) Tom is a student. (C2) Tom looks annoyed because his sister ate his cake. (C3) His sister's name is Sylvia.
Step 3: Select skills required for answering the question e.g.: C2: Tom looks annoyed because his sister ate his cake. Skill: causal relation ("because"), bridging (lexical knowledge of "annoyed = angry")
dataset
prerequisite skills required.
QA4MRE MCTests SQuAD WDW MS MARCO News QA
Avg 3.25 1.56 1.28 2.43 1.19 1.99 Highest – technical documents – Qu handcrafted by experts
QA4MRE MCTest SQuAD WDW MARCO News QA Non sense
10 1 3 27 14 1
QA4MRE MCTests SQuAD WDW MARCO News QA
F-K 14.9 3.6 14.6 15.3 12.1 12.6
Results 4- Correlation between readability metrics and the number of required prerequisite skills
Results 4- Correlation between readability metrics and the number of required prerequisite skills
QA4MRE
MCTest
SQuAD
easy-to-read and easy-to-answer
easy-to-read but difficult-to-answer dataset