Natural Language Processing with Deep Learning CS224N/Ling284 - PowerPoint PPT Presentation

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 18: Tree Recursive Neural Networks, Constituency Parsing, and Sentiment

Lecture Plan: Lecture 18: Tree Recursive Neural Networks, Constituency Parsing, and Sentiment 1. Motivation: Compositionality and Recursion (10 mins) 2. Structure prediction with simple Tree RNN: Parsing (20 mins) 3. Backpropagation through Structure (5 mins) 4. More complex TreeRNN units (35 mins) 5. Other uses of tree-recursive neural nets (5 mins) 6. Institute for Human-Centered Artificial Intelligence (5 mins) 2

1. The spectrum of language in CS 3

Semantic interpretation of language – Not just word vectors How can we work out the meaning of larger phrases? • The snowboarder is leaping over a mogul • A person on a snowboard jumps into the air People interpret the meaning of larger text units – entities, descriptive terms, facts, arguments, stories – by semantic composition of smaller elements 4

Compositionality

La Langua uage un understanding – & Ar Arti tifi ficial I Inte telligence – re require ires be bein ing abl ble to to unde ders rsta tand d big bigger r th thin ings fro from knowin ing abo bout t smaller r pa parts 7

Are languages recursive? • Cognitively somewhat debatable (need to head to infinity) • But: recursion is natural for describing language • [The person standing next to [the man from [the company that purchased [the firm that you used to work at]]]] • noun phrase containing a noun phrase containing a noun phrase • It’s a very powerful prior for language structure 9

Penn Treebank tree 11

2. Building on Word Vector Space Models x 2 1 5 5 4 1.1 4 Germany 1 3 3 9 2 France 2 2 Monday 2.5 Tuesday 1 9.5 1.5 0 1 2 3 4 5 6 7 8 9 10 x 1 the country of my birth the place where I was born How can we represent the meaning of longer phrases? By mapping them into the same vector space! 12

How should we map phrases into a vector space? Socher, Manning, and Ng. ICML, 2011 Use principle of compositionality The meaning (vector) of a sentence x 2 is determined by the country of my birth 5 (1) the meanings of its words and the place where I was born 4 (2) the rules that combine them. Germany 3 France Monday 2 Tuesday 1 0 1 2 3 4 5 6 7 8 9 10 x 1 1 5 Models in this section 5.5 can jointly learn parse 6.1 1 trees and compositional 2.5 3.5 3.8 vector representations 0.4 2.1 7 4 2.3 0.3 3.3 7 4.5 3.6 the country of my birth 13

Constituency Sentence Parsing: What we want S VP PP NP NP 9 5 7 8 9 4 1 3 1 5 1 3 The cat sat on the mat. 14

Learn Structure and Representation 5 S 4 VP 7 3 8 PP 3 5 NP 2 3 NP 3 9 5 7 8 9 4 1 3 1 5 1 3 The cat sat on the mat. 15

Recursive vs. recurrent neural networks 1 5 5.5 6.1 1 2.5 3.5 3.8 0.4 2.1 7 4 2.3 0.3 3.3 7 4.5 3.6 the country of my birth 4.5 2.5 1 1 5.5 3.8 3.8 3.5 5 6.1 0.4 2.1 7 4 2.3 0.3 3.3 7 4.5 3.6 the country of my birth 16

Recursive vs. recurrent neural networks • Recursive neural nets 1 5 require a tree structure 5.5 6.1 1 2.5 3.5 3.8 0.4 2.1 7 4 2.3 0.3 3.3 7 4.5 3.6 the country of my birth • Recurrent neural nets 1 1 5.5 4.5 2.5 3.8 3.8 3.5 5 6.1 cannot capture phrases without prefix context 0.4 2.1 7 4 2.3 and often capture too much 0.3 3.3 7 4.5 3.6 the country of my birth of last words in final vector 17

Recursive Neural Networks for Structure Prediction Inputs: two candidate children’s representations Outputs: 1. The semantic representation if the two nodes are merged. 2. Score of how plausible the new node would be. 8 8 1.3 3 3 3 Neural 3 Network 8 9 4 5 1 3 8 3 5 3 on the mat. 18

Recursive Neural Network Definition 8 score = 1.3 = parent 3 score = U T p Neural c 1 p = tanh ( W + b ) , Network c 2 Same W parameters at all nodes 8 3 of the tree 5 3 c 1 c 2 19

Parsing a sentence with an RNN (greedily) 5 0 1 2 3 2 1 0 3.1 0.3 0 0.4 3 0.1 2.3 Neural Neural Neural Neural Neural Network Network Network Network Network 9 5 7 8 9 4 1 3 1 5 1 3 The cat sat on the mat. 20

Parsing a sentence 2 1 1.1 Neural 1 2 3 Network 0 0 0.4 3 0.1 2.3 5 Neural Neural Neural 2 Network Network Network 9 5 5 7 8 9 4 1 3 3 1 5 1 3 The cat sat on the mat. 21

Parsing a sentence 8 2 3 3.6 1 1.1 Neural Neural 2 Network Network 0 0.1 5 Neural 2 3 Network 3 9 5 5 7 8 9 4 1 3 3 1 5 1 3 The cat sat on the mat. 22

Parsing a sentence 5 4 7 3 8 3 5 2 3 3 9 5 5 7 8 9 4 1 3 3 1 5 1 3 The cat sat on the mat. 23

Max-Margin Framework - Details • The score of a tree is computed by the sum of the parsing decision scores at each node: 8 1.3 3 RNN 8 3 5 3 • x is sentence; y is parse tree 24

Max-Margin Framework - Details • Similar to max-margin parsing (Taskar et al. 2004), a supervised max-margin objective • The loss penalizes all incorrect decisions • Structure search for A(x) was greedy (join best nodes each time) • Instead: Beam search with chart 25

Scene Parsing Similar principle of compositionality. The meaning of a scene image is • also a function of smaller regions, how they combine as parts to form • larger objects, and how the objects interact. • 26

Algorithm for Parsing Images Same Recursive Neural Network as for natural language parsing! (Socher et al. ICML 2011) Parsing Natural Scene Images Parsing Natural Scene Images People Building Grass Tree Semantic Representations Features Segments 27

Multi-class segmentation Method Accuracy Pixel CRF (Gould et al., ICCV 2009) 74.3 Classifier on superpixel features 75.9 Region-based energy (Gould et al., ICCV 2009) 76.4 Local labelling (Tighe & Lazebnik, ECCV 2010) 76.9 Superpixel MRF (Tighe & Lazebnik, ECCV 2010) 77.5 Simultaneous MRF (Tighe & Lazebnik, ECCV 2010) 77.5 Recursive Neural Network 78.1 Stanford Background Dataset (Gould et al. 2009) 28

3. Backpropagation Through Structure Introduced by Goller & Küchler (1996) – old stuff! Principally the same as general backpropagation ⇣ ( W ( l ) ) T δ ( l +1) ⌘ ∂ ∂ W ( l ) E R = δ ( l +1) ( a ( l ) ) T + λ W ( l ) . δ ( l ) = � f 0 ( z ( l ) ) , Calculations resulting from the recursion and tree structure: 1. Sum derivatives of W from all nodes (like RNN) 2. Split derivatives at each node (for tree) 3. Add error messages from parent + node itself 29

BTS: 1) Sum derivatives of all nodes You can actually assume it’s a different W at each node Intuition via example: If we take separate derivatives of each occurrence, we get same: 30

BTS: 2) Split derivatives at each node During forward prop, the parent is computed using 2 children 8 3 c 1 p = tanh ( W + b ) 3 c 2 8 c 1 c 2 3 5 Hence, the errors need to be computed wrt each of them: 8 3 where each child’s error is n -dimensional 3 8 c 1 3 c 2 5 31

BTS: 3) Add error messages • At each node: • What came up (fprop) must come down (bprop) • Total error messages = error messages from parent + error message from own score parent score 8 3 3 8 c 1 c 2 3 5 32

BTS Python Code: forwardProp 33

BTS Python Code: backProp ⇣ ( W ( l ) ) T δ ( l +1) ⌘ δ ( l ) = � f 0 ( z ( l ) ) , ∂ ∂ W ( l ) E R = δ ( l +1) ( a ( l ) ) T + λ W ( l ) . 34

Discussion: Simple TreeRNN • Decent results with single layer TreeRNN • Single weight matrix TreeRNN could capture some phenomena but not adequate for more complex, higher order composition and parsing long sentences • There is no real interaction between the input words s • The composition function is the same W score p for all syntactic categories, punctuation, etc. W c 1 c 2 35

4. Version 2: Syntactically-Untied RNN [Socher, Bauer, Manning, Ng 2013] • A symbolic Context-Free Grammar (CFG) backbone is adequate for basic syntactic structure • We use the discrete syntactic categories of the children to choose the composition matrix • A TreeRNN can do better with different composition matrix for different syntactic environments • The result gives us a better semantics 36

Natural Language Processing with Deep Learning CS224N/Ling284 - PowerPoint PPT Presentation

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 18: Tree Recursive Neural Networks, Constituency Parsing, and Sentiment Lecture Plan: Lecture 18: Tree Recursive Neural Networks, Constituency Parsing,

Natural Language Processing with Deep Learning CS224N The Future of Deep Learning + NLP Kevin

Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 15: Natural Language

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 1:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 9:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 9:

Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 7: Vanishing Gradients

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 12:

Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Machine

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 13:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 11:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 14:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 5:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 16:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 12:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 10:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 2:

Learning Dependency-Based Compositional Semantics Semantic Representations for Textual Inference

Learning Compositional Semantics CS224U: Natural Language Understanding Feb. 9, 2012 Percy Liang

3 John Series Lesson #016 September 14, 2003 Dean Bible Ministries www.deanbibleministries.org

1 Thessalonians Series Lesson #020 May 28, 2015 Dean Bible Ministries

CAREERS & SKILLS ASSEMBLY Modern Languages Modern Languages at Boroughmuir and beyond

MODELING WITH GRAPHS Alistair Jones Neo Technology What is modeling? Model Complexity

Logic Programming Prolog as Language Temur Kutsia Research Institute for Symbolic Computation

Using Race, Ethnicity, While we wait to get started We are recording this webinar.

Natural Language Processing with Deep Learning CS224N/Ling284 - PowerPoint PPT Presentation

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 18: Tree Recursive Neural Networks, Constituency Parsing, and Sentiment Lecture Plan: Lecture 18: Tree Recursive Neural Networks, Constituency Parsing,

Natural Language Processing with Deep Learning CS224N The Future of Deep Learning + NLP Kevin

Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 15: Natural Language

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 1:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 9:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 9:

Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 7: Vanishing Gradients

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 12:

Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Machine

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 13:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 11:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 14:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 5:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 16:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 12:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 10:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 2:

Learning Dependency-Based Compositional Semantics Semantic Representations for Textual Inference

Learning Compositional Semantics CS224U: Natural Language Understanding Feb. 9, 2012 Percy Liang

3 John Series Lesson #016 September 14, 2003 Dean Bible Ministries www.deanbibleministries.org

1 Thessalonians Series Lesson #020 May 28, 2015 Dean Bible Ministries

CAREERS &amp; SKILLS ASSEMBLY Modern Languages Modern Languages at Boroughmuir and beyond

MODELING WITH GRAPHS Alistair Jones Neo Technology What is modeling? Model Complexity

Logic Programming Prolog as Language Temur Kutsia Research Institute for Symbolic Computation

Using Race, Ethnicity, While we wait to get started We are recording this webinar.

CAREERS & SKILLS ASSEMBLY Modern Languages Modern Languages at Boroughmuir and beyond