Machine Comprehension with Discourse Relations Karthik Narasimhan - PowerPoint PPT Presentation

Machine Comprehension with Discourse Relations Karthik Narasimhan Regina Barzilay CSAIL, Massachusetts Institute of Technology 1

Sally ¡liked ¡going ¡outside. ¡She ¡put ¡on ¡her ¡shoes. ¡She ¡went ¡ outside ¡to ¡walk. ¡[...] ¡Missy ¡the ¡cat ¡meowed ¡to ¡Sally. ¡Sally ¡waved ¡ to ¡Missy ¡the ¡cat. ¡[...] ¡Sally ¡hears ¡her ¡name. ¡”Sally, ¡Sally, ¡come ¡ home”, ¡Sally’s ¡mom ¡calls ¡out. ¡Sally ¡runs ¡home ¡to ¡her ¡Mom. ¡ Sally ¡liked ¡going ¡outside. ¡ ¡ ¡ Why ¡did ¡Sally ¡put ¡on ¡her ¡shoes?   A) ¡To ¡wave ¡to ¡Missy ¡the ¡cat   B) ¡To ¡hear ¡her ¡name   C) ¡Because ¡she ¡wanted ¡to ¡go ¡outside ¡ ¡ D) ¡To ¡come ¡home ¡ Sample passage excerpt and question in a Machine Comprehension task 2

Reasoning over multiple sentences Accuracy of baseline systems on MC500-test 80 Multi-sentential questions 60 are significantly harder 40 than single sentence ones 20 0 SWD RTE RTE+SWD Single Multi We focus on modeling multi-sentence relations to improve Q&A performance. 3

Is there only a single relation? Causality Temporality She put on her shoes She went outside to walk Why did Sally put on her shoes? When did Sally put on her shoes? Relation between two clauses is question- dependent. 4

Key idea: Learn relations optimized for MC Sally ¡liked ¡going ¡outside. ¡[…] ¡ Why ¡did ¡Sally ¡put ¡on ¡her ¡shoes?   C) ¡Because ¡she ¡wanted ¡to ¡go ¡ outside ¡ ✓ Traditional approach: Use o ff -the-shelf Training data: Q&A pairs discourse analyzers (Source: Feng and Hirst, 2012) Hypothesis : Task-based discourse relations can facilitate better Comprehension Q&A 5

Fully supervised case Causality She put on her shoes She went outside to walk Why ¡did ¡Sally ¡put ¡on ¡her ¡shoes?   A) ¡To ¡wave ¡to ¡Missy ¡the ¡cat   B) ¡To ¡hear ¡her ¡name   C) ¡Because ¡she ¡wanted ¡to ¡go ¡outside ¡ ✓ ¡ D) ¡To ¡come ¡home ¡ 6

She put on her shoes She went outside to walk Why ¡did ¡Sally ¡put ¡on ¡her ¡shoes?   A) ¡To ¡wave ¡to ¡Missy ¡the ¡cat   B) ¡To ¡hear ¡her ¡name   C) ¡Because ¡she ¡wanted ¡to ¡go ¡outside ¡ ✓ ¡ D) ¡To ¡come ¡home ¡ 7

Sally ¡liked ¡going ¡outside. ¡She ¡put ¡on ¡her ¡shoes. ¡She ¡went ¡outside ¡to ¡ walk. ¡[...] ¡Missy ¡the ¡cat ¡meowed ¡to ¡Sally. ¡Sally ¡waved ¡to ¡Missy ¡the ¡ cat. ¡[...] ¡Sally ¡hears ¡her ¡name. ¡”Sally, ¡Sally, ¡come ¡home”, ¡Sally’s ¡mom ¡ calls ¡out. ¡Sally ¡runs ¡home ¡to ¡her ¡Mom. ¡Sally ¡liked ¡going ¡outside. ¡ Why ¡did ¡Sally ¡put ¡on ¡her ¡shoes?   A) ¡To ¡wave ¡to ¡Missy ¡the ¡cat   B) ¡To ¡hear ¡her ¡name   C) ¡Because ¡she ¡wanted ¡to ¡go ¡outside ¡ ✓ ¡ D) ¡To ¡come ¡home ¡ 8

Key Steps Infer correct relation Causality Identify relevant sentences She put on her shoes She went outside to walk Why ¡did ¡Sally ¡put ¡on ¡her ¡shoes?   C) ¡Because ¡she ¡wanted ¡to ¡go ¡outside ¡ ✓ Select correct answer 9

Three models Identify most relevant sentence from passage Expand to a set of sentences Infer inter-sentential relations 10

Sally ¡liked ¡going ¡outside. ¡She ¡put ¡on ¡her ¡shoes. ¡She ¡went ¡ outside ¡to ¡walk. ¡[...] ¡Missy ¡the ¡cat ¡meowed ¡to ¡Sally. ¡Sally ¡waved ¡ to ¡Missy ¡the ¡cat. ¡[...] ¡Sally ¡hears ¡her ¡name. ¡”Sally, ¡Sally, ¡come ¡ home”, ¡Sally’s ¡mom ¡calls ¡out. ¡Sally ¡runs ¡home ¡to ¡her ¡Mom. ¡ Sally ¡liked ¡going ¡outside. ¡ ¡ ¡ Why ¡did ¡Sally ¡put ¡on ¡her ¡shoes?   A) ¡To ¡wave ¡to ¡Missy ¡the ¡cat   Sentence - z B) ¡To ¡hear ¡her ¡name   Question - q C) ¡Because ¡she ¡wanted ¡to ¡go ¡outside ¡ ¡ D) ¡To ¡come ¡home ¡ Answer - a 11

Identifying a relevant sentence (Model 1) ‣ Retrieve a single relevant sentence from passage. ‣ Joint model over sentence z and answer choice a , given question q . P ( a, z | q ) = P ( z | q ) · P ( a | z, q ) 12

Identifying a relevant sentence set (Model 2) Extends model 1 to select a pair of relevant sentences from passage. ‣ Retrieve a second sentence z 2 conditioned on both question q and the first retrieved sentence z 1 . P ( a, z 1 , z 2 | q ) = P ( z 1 | q ) · P ( z 2 | z 1 , q ) · P ( a | z 1 , z 2 , q ) 13

Incorporating Relations (Model 3) Capture inter-sentential relations, modeled as hidden variables. P ( a, r, z 1 , z 2 | q ) = P ( z 1 | q ) · P ( r | q ) · P ( z 2 | z 1 , r, q ) · P ( a | z 1 , z 2 , r, q ) Flexibility to induce relations between sentences conditioned on the question . 14

Learning ‣ Supervision: question-answer pairs. ‣ Marginalize over hidden variables z and r to get P( a | q ). ‣ Maximize the following objective (model 3): X X ij , z im , z in , r | q ij ) − λ || θ || 2 P ( a ∗ L 3 ( θ ; P train ) = log i,j,m,r ∈ R n ∈ [ m − k,m + k ] 15

Prediction For a given question q , simply choose answer with highest P( a | q ). ‣ Marginalize over all hidden variables z and r . ˆ a j = argmax P ( a jk | q j ) k 16

Lexical Features Type 1 (q, z): ‣ Unigram and bigram matches + entity and action matches Type 2 (q, a, z1, [z2]): ‣ Capture interactions between a, q and sentence(s) (z1, z2). 17

Relational Features Type 3 (q, r, z1, z3) and Type 4 (q, r): ‣ Inter-sentence distance, presence of relation- specific markers (small seed list) in sentences. ‣ Second-order: cross of above features with entity and action match counts. ‣ Connect question word with relation type (Ex. why and Causality ) 18

Discourse in Q&A Prior work has shown value of domain-independent discourse relations in Q&A. ‣ Chai and Jin (2004) incorporate discourse processing into context Q&A. ‣ Verberne et al. (2007) use Rhetorical Structure Theory (RST) to relate question topics and answers. ‣ Jansen et al. (2014) use discourse information to improve answer re-ranking for non-factoid Q&A. 19

Experiments ‣ Data: MCTest (Richardson et al., 2013) Split& MC160& MC500& Passages& Ques4ons& Passages& Ques4ons& Train& 70& 280& 300& 1200& Dev& 30& 120& 50& 200& Test& 60& 240& 150& 600& ‣ > 50% of questions require information from multiple sentences. ‣ Evaluation: Answering accuracy with partial credit for ties (as previously used). 20

Baselines Systems from Richardson et al. (2013) ‣ SWD: uses sliding window to count matches between passage words and words in answer. ‣ RTE: utilizes a textual entailment system to determine if answer is entailed by passage. ‣ RTE+SWD: weighted combination of systems above 21

Comprehension Accuracy Accuracy'of'baselines'compared'to'our'model' 75# 70# SWD# Accuracy' 65# RTE# 60# SWD+RTE# Model#3# 55# 50# MC160#test# MC500#test# 22

Accuracy by Question Type Comparison'of'our'different'model'variants' 71" 69" 67" 65" Accuracy' Model"1" 63" Model"2" 61" 59" Model"3" 57" 55" Single" Mul0" Overall" MC500'test' 23

RST'augmented'model'2'vs'Model'3' 70# 68# 66# 64# Accuracy' 62# 60# Model#2#+#RST# 58# Model#3# 56# 54# 52# 50# Single# Mul1# Overall# MC500'test' Task-based discourse relations can facilitate better Comprehension Q&A 77% of the predicted RST relations are Elaboration! 24

Evaluation using Human judgements We annotated 240 questions from MC160 test set with most relevant sentence(s) in passage, and relations between sentence pairs. ‣ 103 sentence pairs with annotated relations ‣ 34% of these have relevant discourse markers occurring anywhere in sentences. ‣ Only 9% of sentences have a marker at an end. 25

Sentence Retrieval Freq Model 1 Model 2 Model 3 90 67.5 45 22.5 0 Single Multi Overall Table: Recall (@5) of relevant sentences retrieved by different models compared to human judgements. 26

Relation Prediction Relation R @ 1 R @ 2 Causal 56.25 75.00 Temporal 27.27 54.54 Explanation 16.66 33.33 Other 57.40 64.81 Overall 51.45 65.04 Table : Recall of annotated relations at various thresholds in ranking produced by Model 3 27

Conclusions ‣ Discourse relations help in the task of machine comprehension Q&A involving multiple sentences. ‣ A task-specific approach of incorporating discourse information does better than using off- the-shelf analyzers. Code and data will be available at: http://people.csail.mit.edu/karthikn/mcdr/ 28

Machine Comprehension with Discourse Relations Karthik Narasimhan - PowerPoint PPT Presentation

Machine Comprehension with Discourse Relations Karthik Narasimhan Regina Barzilay CSAIL, Massachusetts Institute of Technology 1 Sally liked going outside. She put on her shoes. She went outside to walk. [...]

Comprehension Skills: Teacher Presentation Book, Comprehension Skills: Teacher Presentation Book,

Computational Models of Discourse Regina Barzilay MIT What is Discourse? What is Discourse?

Discourse Coherence Lecture Plan: Einf uhrung in Pragmatik Discourse cohesion and

Computational Discourse 11-711 Algorithms for NLP 15 November 2018 What Is Discourse? Discourse

Computational Discourse 11-711 Algorithms for NLP 31 October 2019 What Is Discourse? Discourse

Literacy Strategies Literacy Strategies What is comprehension? What is comprehension? Simply

Explicit Discourse Connectives Implicit Discourse Relations Bonnie Webber Hannah Rohde

Explicit Discourse Connectives Implicit Discourse Relations Bonnie Webber Hannah Rohde

Discourse Structure Ling575 Discourse & Dialogue April 13, 2011 Roadmap Project

Quantifying Program Complexity and Comprehension Quantifying Program Complexity and Comprehension

A Systematic Study of Neural Discourse Models for Implicit Discourse Relation Attapol T.

Using Natural Language Relations between Answer Choices for Machine Comprehension Rajkumar Pujari

Modeling Discourse Cohesion for Discourse Parsing via Memory Network Yanyan Jia, Yuan Ye, Yansong

IMMIGRATION: CHANGING THE PUBLIC DISCOURSE IMMIGRATION: CHANGING THE PUBLIC DISCOURSE

Memory-Enhanced Models for Discourse Understanding COMP90042 Web Search and Text Analysis Guest

Discourse particles and their connection to sentence types, speech acts, and discourse Eva Csipak

Sacramento Collaborative to Advance Testing and Care of Hepatitis B (SCrATCH B) Duke LeTran COE

Predication and NP Structure in an Michael Hahn University of Omnipredicative Language: The Case

Hardware Modeling 2 Cache Analyses Peter Puschner slides credits: P. Puschner, R. Kirner, B.

Solving intertemporal CGE model in parallel using Singly Bordered Block Diagonal ordering technique

Every Second Counts: Organise yourself through strategic and purposeful planning jcu.edu.au

Holographic perspectives on the Kibble-Zurek mechanism z x 2 x 1 What is the Kibble-Zurek

Beginning Core Data Relationships Relationships Pet: Fido Pet: Spot Food: Charmed Cheese

TheRacefortheHiggsBoson (ATevatronPerspective)

Machine Comprehension with Discourse Relations Karthik Narasimhan - PowerPoint PPT Presentation

Machine Comprehension with Discourse Relations Karthik Narasimhan Regina Barzilay CSAIL, Massachusetts Institute of Technology 1 Sally liked going outside. She put on her shoes. She went outside to walk. [...]

Comprehension Skills: Teacher Presentation Book, Comprehension Skills: Teacher Presentation Book,

Computational Models of Discourse Regina Barzilay MIT What is Discourse? What is Discourse?

Discourse Coherence Lecture Plan: Einf uhrung in Pragmatik Discourse cohesion and

Computational Discourse 11-711 Algorithms for NLP 15 November 2018 What Is Discourse? Discourse

Computational Discourse 11-711 Algorithms for NLP 31 October 2019 What Is Discourse? Discourse

Literacy Strategies Literacy Strategies What is comprehension? What is comprehension? Simply

Explicit Discourse Connectives Implicit Discourse Relations Bonnie Webber Hannah Rohde

Explicit Discourse Connectives Implicit Discourse Relations Bonnie Webber Hannah Rohde

Discourse Structure Ling575 Discourse &amp; Dialogue April 13, 2011 Roadmap Project

Quantifying Program Complexity and Comprehension Quantifying Program Complexity and Comprehension

A Systematic Study of Neural Discourse Models for Implicit Discourse Relation Attapol T.

Using Natural Language Relations between Answer Choices for Machine Comprehension Rajkumar Pujari

Modeling Discourse Cohesion for Discourse Parsing via Memory Network Yanyan Jia, Yuan Ye, Yansong

IMMIGRATION: CHANGING THE PUBLIC DISCOURSE IMMIGRATION: CHANGING THE PUBLIC DISCOURSE

Memory-Enhanced Models for Discourse Understanding COMP90042 Web Search and Text Analysis Guest

Discourse particles and their connection to sentence types, speech acts, and discourse Eva Csipak

Sacramento Collaborative to Advance Testing and Care of Hepatitis B (SCrATCH B) Duke LeTran COE

Predication and NP Structure in an Michael Hahn University of Omnipredicative Language: The Case

Hardware Modeling 2 Cache Analyses Peter Puschner slides credits: P. Puschner, R. Kirner, B.

Solving intertemporal CGE model in parallel using Singly Bordered Block Diagonal ordering technique

Every Second Counts: Organise yourself through strategic and purposeful planning jcu.edu.au

Holographic perspectives on the Kibble-Zurek mechanism z x 2 x 1 What is the Kibble-Zurek

Beginning Core Data Relationships Relationships Pet: Fido Pet: Spot Food: Charmed Cheese

TheRacefortheHiggsBoson (ATevatronPerspective)

Discourse Structure Ling575 Discourse & Dialogue April 13, 2011 Roadmap Project