To Attend or not to Attend: A Case Study on Syntactic Structures - PowerPoint PPT Presentation

To Attend or not to Attend:   A Case Study on Syntactic Structures for Semantic Relatedness

Authors Amulya Gupta Zhu (Drew) Zhang Email: guptaam@iastate.edu Email: zhuzhang@iastate.edu https://github.com/amulyahwr/ acl2018 � 2

Agenda Introduction Classical world Alternate world Our contribution Summary � 3

Problem Statement Given two sentences, determine the semantic similarity between them. � 4 Introduction

Tasks Semantic relatedness for sentence Paraphrase detection for question • • pairs. pairs. 1. Predict relatedness score (real 1. Given a pair of questions , classify value) for a pair of sentences them as paraphrase or not 2. Higher score implies higher 2. Binary classification semantic similarity among 1. 1 : Paraphrase sentences 2. 0: Not paraphrase Essence: Given two sentences, determine the semantic similarity between them. � 5 Introduction

Datasets used Semantic relatedness for sentence Paraphrase detection for question • • pairs. pairs. 1. SICK (Marelli et al., 2014) Quora (Iyer et al., Kaggle, 2017) 1. 1. Score range: [1, 5] Binary classification 1. 2. Dataset: 4500/500/4927(train/dev/ 1. 1 : Paraphrase test) 2. 0: Not paraphrase Dataset: Used 50,000 data- 2. points out of 400,000 2. MSRpar (Agirre et al., 2012) 80%(5%) /20% (train(dev)/test) 1. Score range: [0, 5] 2. Dataset: 750/750 (train/test) � 6 Introduction

Examples SICK The badger is burrowing a hole A hole is being burrowed by the badger 4.9 MSRpar The reading for both August and July is the It is the highest reading since the index was 3 best seen since the survey began in August created in August 1997. 1997. Quora What is bigdata? Is bigdata really doing well? 0 � 7 Introduction

Linear Generally, a sentence is read in a linear form. English ( Left to Right ) : Traditional Chinese ( Top to Bottom ) : The badger is burrowing a hole. Urdu ( Right to Left ): . ےہ اتید کنیھپ خاروس کیا جیب (Google Translate) � 8 Classical Introduction world

Long Short Term Memory (LSTM) o1 o3 o4 o6 o2 o5 LSTM LSTM LSTM LSTM LSTM LSTM cell cell cell cell cell cell e_hole e_The e_badger e_is e_burrowing e_a � 9 Classical Introduction world

Long Short Term Memory (LSTM) LSTM cell o1 o3 o4 o6 o2 o5 LSTM LSTM LSTM LSTM LSTM LSTM cell cell cell cell cell cell e_hole e_The e_badger e_is e_burrowing e_a � 10 Classical Introduction world

Attention mechanism Neural Machine Translation (NMT) Global Attention Model (GAM) (Bahdanau et al., 2014) (Luong et al., 2015) � 11 Classical Introduction world

Tree Constituency Dependency burrowing nsubj dobj aux badger hole is det det The a � 12 Alternate Classical Introduction world world

Tree-LSTM (Tai et al., 2015) T-LSTM cell o4 T-LSTM e_burrowing cell o2 o6 o3 T-LSTM T-LSTM T-LSTM cell cell cell o1 o5 e_badger e_is e_hole T-LSTM T-LSTM cell cell e_The e_a � 13 Classical Alternate Introduction world world

Attention mechanism � 14 Classical Alternate Introduction world world

Decomposable Attention (Parikh et al., 2016) Aggregate Compare No structural No structural encoding encoding e1 e2 e3 e4 e5 e6 e7 e8 e1 e2 e3 e4 Attend: Attention matrix � 15 Sentence R Sentence L Classical Alternate Introduction world world

Modified Decomposable Attention (MDA) output h + h x (Absolute Distance similarity: (Sign similarity: Element wise absolute difference) Element wise multiplication) Modification 2 MDA is employed after HR HL encoding sentences. o2 o3 o1 o3 Attention matrix T-LSTM T-LSTM cell cell o2 o1 Modification 1 T-LSTM T-LSTM T-LSTM T-LSTM cell cell cell cell � 16 Sentence R Sentence L Classical Alternate Our Introduction world contribution world

Testset Results Linear Constituency Dependency w/o Attention MDA w/o Attention MDA w/o Attention MDA Pearson’s r 0.327 0.3763 0.3981 0.3991 0.4921 0.4016 MSRpar Spearman’s 0.2205 0.3025 0.315 0.3237 0.4519 0.331 ρ MSE 0.8098 0.729 0.7407 0.722 0.6611 0.7243 Linear Constituency Dependency w/o Attention MDA w/o Attention MDA w/o Attention MDA SICK Pearson’s r 0.8398 0.7899 0.8582 0.779 0.8676 0.8239 Spearman’s 0.7782 0.7173 0.7966 0.7074 0.8083 0.7614 ρ MSE 0.3024 0.3897 0.2734 0.4044 0.2532 0.3326 � 17 Classical Alternate Our Introduction world world contribution

Progressive Attention (PA) HR HL a1 o3 o1 1-a1 Gating o3 a3 o3 1-a3 mechanism o2 a2 o3 1-a2 Attention vector T-LSTM T-LSTM cell cell T-LSTM T-LSTM T-LSTM T-LSTM cell cell cell cell Sentence L Sentence R Start Phase 1 � 18 Classical Alternate Our Introduction world world contribution

Progressive Attention (PA) HR HL a1 o3 o1 1-a1 Gating o3 a3 o3 1-a3 mechanism o2 a2 o3 1-a2 HL HR Attention vector T-LSTM T-LSTM cell cell T- T- T- T- T- T- T-LSTM T-LSTM T-LSTM T-LSTM cell cell cell cell Sentence L Sentence R Start Phase 1 � 19 Classical Alternate Our Introduction world world contribution

Progressive Attention (PA) PA is employed during output encoding sentences. h + h x (Absolute Distance similarity: (Sign similarity: Element wise absolute difference) Element wise multiplication) HL HR T- T- T- T- T- T- T- T- T- T- T- T- T- � 20 Classical Alternate Our Introduction world world contribution

Effectiveness of PA ID Sentence 1 Sentence 2 Gold Linear Constituency Dependency No PA No PA No attn PA attn attn 1 The badger is burrowing A hole is being 4.9 2.60 3.02 3.52 4.34 3.41 4.63 a hole burrowed by the badger � 21 Classical Alternate Our Introduction world world contribution

Testset Results Linear Constituency Dependency w/o w/o w/o MDA PA MDA PA MDA PA Attention Attention Attention MSRpar Pearson’s r 0.327 0.3763 0.4773 0.3981 0.3991 0.5104 0.4921 0.4016 0.4727 Spearman’s ρ 0.2205 0.3025 0.4453 0.315 0.3237 0.4764 0.4519 0.331 0.4216 MSE 0.8098 0.729 0.6758 0.7407 0.722 0.6436 0.6611 0.7243 0.6823 Linear Constituency Dependency w/o w/o w/o MDA PA MDA PA MDA PA Attention Attention Attention SICK Pearson’s r 0.8398 0.7899 0.8550 0.8582 0.779 0.8625 0.8676 0.8239 0.8424 Spearman’s ρ 0.7782 0.7173 0.7966 0.7074 0.8083 0.7614 0.7873 0.7997 0.7733 MSE 0.3024 0.3897 0.2761 0.2734 0.4044 0.2610 0.2532 0.3326 0.2963 Classical Alternate Our Introduction world world contribution

Discussion Classical Alternate Our Introduction world world contribution

Discussion Gildea (2004): Dependencies vs. • Constituents for Tree-Based Alignment • Is it because attention can be considered as an implicit form of structure which complements the explicit form of syntactic structure? • If yes, does there exist some tradeoff between modeling efforts invested in syntactic and attention Attention structure? • Does this mean there is a closer Impact affinity between dependency structure and compositional semantics? Linear Constituency Dependency • If yes, is it because dependency structure embody more semantic Structural Information information? Classical Alternate Our Introduction world world contribution

Summary Proposed a modified decomposable attention (MDA) • and a novel progressive attention (PA) model on tree based structures. Investigated the impact of proposed attention models • on syntactic structures in linguistics. Classical Alternate Our Introduction Summary Summary world world contribution

To Attend or not to Attend: A Case Study on Syntactic Structures - PowerPoint PPT Presentation

To Attend or not to Attend: A Case Study on Syntactic Structures for Semantic Relatedness Authors Amulya Gupta Zhu (Drew) Zhang Email: guptaam@iastate.edu Email: zhuzhang@iastate.edu https://github.com/amulyahwr/ acl2018 2 Agenda

Outline Information Retrieval (IR) Syntactic IR Problems of Syntactic IR Semantic

Chapter 3: Syntactic Forms, Grammatical Functions, and Semantic Roles Syntactic Constructions in

Resugaring: Lifting Evaluation Sequences through Syntactic Sugar Justin Pombrio, Shriram

Introduction Syntactic analysis (5LN455) Syntactic parsing (5LN713/5LN717) 2017-11-07 Sara

Case study 2 Case study 2 Case study 2 Case study 2 Former Industrial Site, London: How has

The Java Syntactic Extender Jonathan Bachrach MIT AI Lab Keith Playford Functional Objects,

Syntactic list of tokens analysis Syntactic analyzer grammar: context free format: BNF

Basic Issues in Syntactic Parsing Joakim Nivre Uppsala University Department of Linguistics and

Chapter 6: Noun Phrases and Agreement Syntactic Constructions in English Kim and Michaelis (2020)

Chapter 1: What Is a Theory of English Syntax about Syntactic Constructions in English Kim and

Syntactic Theory Lecture 3 (11.11.2010) PD Dr.Valia Kordoni Email: kordoni@coli.uni-sb.de

Outline The residue of syntactic change: Syntactic Change Partial pro-drop in Old English

The HISPACAT comparative database of syntactic constructions and its applications to syntactic

Syntactic Processing: Parts-of-Speech Tagging CSE354 - Spring 2020 Task Syntactic

Natural Language Processing Syntactic Models Machine Translation III Dan Klein UC Berkeley 1

How Expert Knowledge Can Three Case Studies Help Measurements: First Case Study Second Case

Modified ES-MDA Algorithms for Data Assimilation and Uncertainty Quantification Javad Rafiee and

Transitioning to a Managed Accounts business The Spectrum of Business models available Brett

PIM to PSM transformations for an event driven architecture in an educational tool Geert

2017 Sohn Investment Conference May 2017 Algebris Investments (UK) LLP and Algebris (UK) Limited

Investor Presentation Q2 & H1 FY2014 Disclaimer Certain statements in this document may be

Delivering client confidence Agenda Chairman 1 CEO Update 2 Formal Business 3 Alex

Software Architecture A Model Driven View Dong Chen Email: zaknova@gmail.com University of

2016 Placemaking Award Finalists Brittingham Boats: 701 W Brittingham Pl. Placemaking Award for

To Attend or not to Attend: A Case Study on Syntactic Structures - PowerPoint PPT Presentation

To Attend or not to Attend: A Case Study on Syntactic Structures for Semantic Relatedness Authors Amulya Gupta Zhu (Drew) Zhang Email: guptaam@iastate.edu Email: zhuzhang@iastate.edu https://github.com/amulyahwr/ acl2018 2 Agenda

Outline Information Retrieval (IR) Syntactic IR Problems of Syntactic IR Semantic

Chapter 3: Syntactic Forms, Grammatical Functions, and Semantic Roles Syntactic Constructions in

Resugaring: Lifting Evaluation Sequences through Syntactic Sugar Justin Pombrio, Shriram

Introduction Syntactic analysis (5LN455) Syntactic parsing (5LN713/5LN717) 2017-11-07 Sara

Case study 2 Case study 2 Case study 2 Case study 2 Former Industrial Site, London: How has

The Java Syntactic Extender Jonathan Bachrach MIT AI Lab Keith Playford Functional Objects,

Syntactic list of tokens analysis Syntactic analyzer grammar: context free format: BNF

Basic Issues in Syntactic Parsing Joakim Nivre Uppsala University Department of Linguistics and

Chapter 6: Noun Phrases and Agreement Syntactic Constructions in English Kim and Michaelis (2020)

Chapter 1: What Is a Theory of English Syntax about Syntactic Constructions in English Kim and

Syntactic Theory Lecture 3 (11.11.2010) PD Dr.Valia Kordoni Email: kordoni@coli.uni-sb.de

Outline The residue of syntactic change: Syntactic Change Partial pro-drop in Old English

The HISPACAT comparative database of syntactic constructions and its applications to syntactic

Syntactic Processing: Parts-of-Speech Tagging CSE354 - Spring 2020 Task Syntactic

Natural Language Processing Syntactic Models Machine Translation III Dan Klein UC Berkeley 1

How Expert Knowledge Can Three Case Studies Help Measurements: First Case Study Second Case

Modified ES-MDA Algorithms for Data Assimilation and Uncertainty Quantification Javad Rafiee and

Transitioning to a Managed Accounts business The Spectrum of Business models available Brett

PIM to PSM transformations for an event driven architecture in an educational tool Geert

2017 Sohn Investment Conference May 2017 Algebris Investments (UK) LLP and Algebris (UK) Limited

Investor Presentation Q2 &amp; H1 FY2014 Disclaimer Certain statements in this document may be

Delivering client confidence Agenda Chairman 1 CEO Update 2 Formal Business 3 Alex

Software Architecture A Model Driven View Dong Chen Email: zaknova@gmail.com University of

2016 Placemaking Award Finalists Brittingham Boats: 701 W Brittingham Pl. Placemaking Award for

Investor Presentation Q2 & H1 FY2014 Disclaimer Certain statements in this document may be