Top-down Tree Long Short-Term Memory Networks Xingxing Zhang , Liang - PowerPoint PPT Presentation

Top-down Tree Long Short-Term Memory Networks Xingxing Zhang , Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh 12th June, 2016 Zhang et al., 2016 Tree LSTM 12th June, 2016 1 / 18

Sequential Language Models n � P ( S = w 1 , w 2 , . . . , w n ) = P ( w i | w 1: i − 1 ) (1) i =1 State of the Art based on Long Short Term Memory Network Language Model (Hochreiter and Schmidhuber, 1997; Sundermeyer et al., 2012) Billion word benchmark results reported in Jozefowicz et al., (2016) Models PPL KN5 67.6 LSTM 30.6 LSTM+CNN INPUTS 30.0 Zhang et al., 2016 Tree LSTM 12th June, 2016 2 / 18

Will tree structures help LMs? Zhang et al., 2016 Tree LSTM 12th June, 2016 3 / 18

Will tree structures help LMs? Probably yes LMs based on Constituency Parsing (Chelba and Jelinek, 2000; Roark, 2001; Charniak, 2001) LMs based on Dependency Parsing (Shen et al., 2008; Zhang, 2009; Sennrich, 2015) Zhang et al., 2016 Tree LSTM 12th June, 2016 3 / 18

LSTMs + Dependency Trees = TreeLSTMs + Why? Sentence Length N v.s. Tree Height log ( N ) Zhang et al., 2016 Tree LSTM 12th June, 2016 4 / 18

LSTMs + Dependency Trees = TreeLSTMs + Why? Sentence Length N v.s. Tree Height log ( N ) How? Top-down Generation Breadth-first search reminiscent of Eisner (1996) Zhang et al., 2016 Tree LSTM 12th June, 2016 4 / 18

Generation Process (Unlabeled Trees) The luxury auto manufacturer last year sold 1,214 cars in the U.S. Zhang et al., 2016 Tree LSTM 12th June, 2016 5 / 18

Tree LSTM n � P ( S ) = P ( w i | w 1: i − 1 ) (2) i =1 ⇓ � P ( S | T ) = P ( w |D ( w )) (3) w ∈ BFS( T ) \ root D ( w ) is the Dependency Path of w . D ( w ) is a generated sub-tree. Works on projective and unlabeled dependency trees. Zhang et al., 2016 Tree LSTM 12th June, 2016 6 / 18

Tree LSTM Zhang et al., 2016 Tree LSTM 12th June, 2016 7 / 18

One Limitation of Tree LSTM Zhang et al., 2016 Tree LSTM 12th June, 2016 8 / 18

Left Dependent Tree LSTM Zhang et al., 2016 Tree LSTM 12th June, 2016 9 / 18

Experiments Zhang et al., 2016 Tree LSTM 12th June, 2016 10 / 18

MSR Sentence Completion Challenge Training set: 49 million words (around 2 million sentences) development set: 4000 sentences test set: 1040 completion questions. Zhang et al., 2016 Tree LSTM 12th June, 2016 11 / 18

Zhang et al., 2016 Tree LSTM 12th June, 2016 12 / 18

Dependency Parsing Reranking Rerank 2nd Order MSTParser (McDonald and Pereira, 2006) We train TreeLSTM and LdTreeLSTM as language models. We only use words as input features; POS tags, dependency labels or composition features are not used. Zhang et al., 2016 Tree LSTM 12th June, 2016 13 / 18

Dependency Parsing Reranking NN: Chen & Manning, 2014; S-LSTM: Dyer et al., 2015 Zhang et al., 2016 Tree LSTM 12th June, 2016 14 / 18

Tree Generation Four binary classifiers: Add Left? No! Features: hidden states and word embeddings Classifiers Accuracies Add-Left 94.3 Add-Right 92.6 Add-Nx-Left 93.4 Add-Nx-Right 96.0 Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Tree Generation Four binary classifiers: Add Right? Yes! Features: hidden states and word embeddings Classifiers Accuracies Add-Left 94.3 Add-Right 92.6 Add-Nx-Left 93.4 Add-Nx-Right 96.0 Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Tree Generation Four binary classifiers: Add Next Right? No! Features: hidden states and word embeddings Classifiers Accuracies Add-Left 94.3 Add-Right 92.6 Add-Nx-Left 93.4 Add-Nx-Right 96.0 Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Tree Generation Four binary classifiers: Add Left? Yes! Features: hidden states and word embeddings Classifiers Accuracies Add-Left 94.3 Add-Right 92.6 Add-Nx-Left 93.4 Add-Nx-Right 96.0 Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Tree Generation Four binary classifiers: Add Next Left? No! Features: hidden states and word embeddings Classifiers Accuracies Add-Left 94.3 Add-Right 92.6 Add-Nx-Left 93.4 Add-Nx-Right 96.0 Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Tree Generation Four binary classifiers: Add Left? Add Right? Add Next Left? Add Next Right? Features: hidden states and word embeddings Classifiers Accuracies Add-Left 94.3 Add-Right 92.6 Add-Nx-Left 93.4 Add-Nx-Right 96.0 Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Tree Generation Zhang et al., 2016 Tree LSTM 12th June, 2016 16 / 18

Conclusions Syntax can help language modeling. Predicting tree structures with Neural Networks is possible. Next Steps: Sequence to Tree Models Tree to Tree Models code available: https://github.com/XingxingZhang/td-treelstm Thanks & Questions? Zhang et al., 2016 Tree LSTM 12th June, 2016 17 / 18

Top-down Tree Long Short-Term Memory Networks Xingxing Zhang , Liang - PowerPoint PPT Presentation

Top-down Tree Long Short-Term Memory Networks Xingxing Zhang , Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh 12th June, 2016 Zhang et al., 2016 Tree LSTM 12th June, 2016 1 / 18 Sequential Language Models n P (

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Class 15 - Long Short-Term Memory (LSTM) Class 15 - Long Short-Term Memory (LSTM) Study materials

The short- -term and long term and long- -term term The short stratospheric and tropospheric

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Long-Term Memory Introduction STM versus LTM Episodic Memory Semantic Memory

An Introduction to Neural Networks Long Short Term Memory (LSTM) and the Attention mechanism Ange

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

Top-Down Parsing Slides modified from Louden Book and Dr. Scherger Top Down Parsing A

Agenda What is Top-down Web services? Benefit of top-down Web services How to develop

SHORT-TERM RENTALS IN AUSTIN, TX Smart City Policy Summit September 17, 2019 Todd LaRue,

Chapter 3 - Cognition Types of human memory Short term memory and cognitive processes

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai

Natural Language Processing (CSE 517): Dependency Syntax and Parsing Noah A. Smith Swabha

Natural Language Processing (CSE 517): Dependency Structure Noah Smith 2016 c University of

Motivation Moores Law continues More transistors & memory controllers on modern

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

Draft version: Please do not circulate or quote without authors permission WINNING IN ASIA,

You Are How You Drive: Peer and Temporal-Aware Representation Learning for Driving Behavior

Drupal Core Auto-Update Drupal Core Auto-Update Architecture Architecture Peter Wolanin Peter

Financial Econometrics Econ 40357 ARIMA (Auto Regressive Integrated Moving Average) Models Part

Top-down Tree Long Short-Term Memory Networks Xingxing Zhang , Liang - PowerPoint PPT Presentation

Top-down Tree Long Short-Term Memory Networks Xingxing Zhang , Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh 12th June, 2016 Zhang et al., 2016 Tree LSTM 12th June, 2016 1 / 18 Sequential Language Models n P (

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Class 15 - Long Short-Term Memory (LSTM) Class 15 - Long Short-Term Memory (LSTM) Study materials

The short- -term and long term and long- -term term The short stratospheric and tropospheric

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Long-Term Memory Introduction STM versus LTM Episodic Memory Semantic Memory

An Introduction to Neural Networks Long Short Term Memory (LSTM) and the Attention mechanism Ange

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

Top-Down Parsing Slides modified from Louden Book and Dr. Scherger Top Down Parsing A

Agenda What is Top-down Web services? Benefit of top-down Web services How to develop

SHORT-TERM RENTALS IN AUSTIN, TX Smart City Policy Summit September 17, 2019 Todd LaRue,

Chapter 3 - Cognition Types of human memory Short term memory and cognitive processes

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai

Natural Language Processing (CSE 517): Dependency Syntax and Parsing Noah A. Smith Swabha

Natural Language Processing (CSE 517): Dependency Structure Noah Smith 2016 c University of

Motivation Moores Law continues More transistors &amp; memory controllers on modern

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

Draft version: Please do not circulate or quote without authors permission WINNING IN ASIA,

You Are How You Drive: Peer and Temporal-Aware Representation Learning for Driving Behavior

Drupal Core Auto-Update Drupal Core Auto-Update Architecture Architecture Peter Wolanin Peter

Financial Econometrics Econ 40357 ARIMA (Auto Regressive Integrated Moving Average) Models Part

Motivation Moores Law continues More transistors & memory controllers on modern