Tackling the Limits of Deep Learning for NLP Richard Socher - PowerPoint PPT Presentation

Tackling the Limits of Deep Learning for NLP Richard Socher Salesforce Research Caiming Xiong, Stephen Merity, James Bradbury, Victor Zhong, Kazuma Hashimoto and Stanford: Hakan Inan, Khashayar Khosravi

The Limits of Single Task Learning • Great performance improvements • Projects start from random • Single unsupervised task can’t fix it • How to express different tasks in the same framework, e.g. – sequence tagging – sentence-level classification – seq2seq?

Framework for Tackling NLP A joint model for comprehensive QA

QA Examples A: NNP VBZ DT NN IN NNP . I: Mary walked to the bathroom. I: I think this model is incredible I: Sandra went to the garden. Q: In French? I: Daniel went back to the garden. A: Je pense que ce mod` ele est incroyable. I: Sandra took the milk there. I: Q: Where is the milk? A: garden I: Everybody is happy. Q: What’s the sentiment? A: positive What color are Q: What color are the bananas? A: Green. Move from {x i ,y i } to {x i ,q i ,y i }

First of Six Major Obstacles • For NLP no single model architecture with consistent state of the art results across tasks Task State of the art model Question answering Strongly Supervised MemNN (babI) (Weston et al 2015) Sentiment Analysis Tree-LSTMs (Tai et al. 2015) (SST) Part of speech tagging Bi-directional LSTM-CRF (PTB-WSJ) (Huang et al. 2015)

Tackling Obstacle 1: Dynamic Memory Network Episodic Memory Answer module 2 2 2 2 2 2 2 2 e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 Module 2 m 0.0 0.3 0.0 0.0 0.0 0.9 0.0 0.0 y 1 1 1 1 1 1 1 1 e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 > a S w O l l E 1 a 0.3 0.0 0.0 0.0 0.0 0.0 1.0 0.0 m h < Input Module Question Module s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 q w w 1 T . . . . . . . . ? e y e y l m n n l l r a r a a l e e a e o e w w b h d b h h o t c l l r o t l t l o r t a a a o k d i l o g k h l h f l e a f i e e m b e e b e e h h h h t h h e o t t e t t t h t o o s h o o o t n f t i t t t t o w e e t k t d t n r o t h n o c e e g d t d e a e h l w e b l w e t W y t v o v u r t y a o g n n a p r M m h e r n a n t o w h M n y h J o h a r o a J o r J M d J n a S

The Modules: Episodic Memory Semantic Memory Answer module Episodic Memory 2 2 2 2 2 2 2 2 e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 Module Module 2 m 0.0 0.3 0.0 0.0 0.0 0.9 0.0 0.0 (Glove vectors) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 hallway e 1 e 1 e 2 e 2 e 3 e 3 e 4 e 4 e 5 e 5 e 6 e 6 e 7 e 7 e 8 e 8 <EOS> 1 1 0.3 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 m m Input Module Question Module s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 q w w 1 T Where is the fooball? Mary got the milk there. John moved to the bedroom. Sandra went back to the kitchen. Mary travelled to the hallway. John got the football there. John went to the hallway. John put down the football. Mary went to the garden. # = 𝑕 " # ℎ "+, # 𝐻𝑆𝑉 𝑡 " , ℎ "+, # # ℎ " + 1 − 𝑕 " Last hidden state: m t

The Modules: Episodic Memory • Gates are activated if sentence relevant to the question or memory # = [𝑡 " ∘ 𝑟 ; 𝑡 " ∘ 𝑛 #+, ; |𝑡 " − 𝑟| ; |𝑡 " − 𝑛 #+, |] 𝑨 " • When the end of the input is reached, the relevant facts are summarized in another GRU

Related work Sequence to Sequence (Sutskever et al. 2014) • Neural Turing Machines (Graves et al. 2014) • Teaching Machines to Read and Comprehend (Hermann et al. 2015) • Learning to Transduce with Unbounded Memory (Grefenstette 2015) • Structured Memory for Neural Turing Machines (Wei Zhang 2015) • Memory Networks (Weston et al. 2015) • End to end memory networks (Sukhbaatar et al. 2015) • à Main difference: Sequence models for all functions in DMN, allowing for greater generality of tasks that be ”answered”

Comparison to MemNets Similarities: • MemNets and DMNs have input, scoring, attention and response mechanisms Differences: • For input representations MemNets use bag of word, nonlinear or linear embeddings that explicitly encode position • MemNets iteratively run functions for attention and response • DMNs show that neural sequence models can be used for input representation, attention and response mechanisms à naturally captures position and temporality • Enables broader range of applications

Analysis of Number of Episodes • How many attention + memory passes are needed in the episodic memory? • Results on Babi dataset and Stanford Sentiment Max sentiment task 3 task 7 task 8 passes (fine grain) three-facts count lists/sets 0 pass 0 48.8 33.6 50.0 1 pass 0 48.8 54.0 51.5 2 pass 16.7 49.1 55.6 52.1 3 pass 64.7 83.4 83.4 50.1 5 pass 95.2 96.9 96.5 N/A

Analysis of Attention for Sentiment • Sharper attention when 2 passes are allowed. • Examples that are wrong with just one pass

Analysis of Attention for Sentiment • Examples where full sentence context from first pass changes attention to words more relevant for final prediction

Analysis of Attention for Sentiment

Modularization Allows for Different Inputs Answer Answer Episodic Memory Episodic Memory Kitchen Palm Input Module Input Module Question Question Where is John moved to the What kind garden. the of tree is John got the apple there. apple? in the John moved to the kitchen. backgrou Sandra picked up the nd? milk there. John dropped the apple. John moved to the office. (a) Text Question-Answering (b) Visual Question-Answering Dynamic Memory Networks for Visual and Textual Question Answering, Caiming Xiong, Stephen Merity, Richard Socher

Input Module for Images Input Module Input fusion GRU GRU GRU layer GRU GRU GRU W W W embedding Feature Visual feature extraction 512 CNN 14 14

Accuracy: Visual Question Answering test-dev test-std VQA test-dev and Method All Y/N Other Num All test-standard: VQA • Antol et al. (2015) Image 28.1 64.0 3.8 0.4 - • ACK Wu et al. (2015); Question 48.1 75.7 27.1 36.7 - • iBOWIMG - Zhou et al. Q+I 52.6 75.6 37.4 33.7 - (2015); LSTM Q+I 53.7 78.9 36.4 35.2 54.1 • DPPnet - Noh et al. ACK 55.7 79.2 40.1 36.1 56.0 (2015); D-NMN - Andreas iBOWIMG 55.7 76.5 42.6 35.0 55.9 et al. (2016); DPPnet 57.2 80.7 41.7 37.2 57.4 • SAN - Yang et al. (2015) D-NMN 57.9 80.5 43.1 37.4 58.0 SAN 58.7 79.3 46.1 36.6 58.9 DMN+ 60.3 80.5 48.3 36.8 60.4

Attention Visualization Answer: metal What is this sculpture What color are Answer: green made out of ? the bananas ? What is the pattern on the Answer: stripes Did the player hit Answer: yes cat ' s fur on its tail ? the ball ? Figure 4. Examples of qualitative results of attention for VQA. Each image (left) is shown

Attention Visualization What is the main color on Answer: blue What type of trees are in Answer: pine the bus ? the background ? How many pink flags Answer: 2 Is this in the wild ? Answer: no are there ?

Attention Visualization picture taken ? Which man is dressed more Answer: right Who is on both photos ? Answer: girl flamboyantly ? What is the boy holding ? Answer: surfboard What time of day was this Answer: night picture taken ? shown with the attention that the episodic memory

• DEMO

Obstacle 2: Joint Many-task Learning • Fully joint multitask learning* is hard: – Usually restricted to lower layers – Usually helps only if tasks are related – Often hurts performance if tasks are not related * meaning: same decoder/classifier and not only transfer learning with source target task pairs

Tackling Joint Training • A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks Kazuma Hashimoto, Entailment Caiming Xiong, Entailment Entailment encoder encoder semantic Yoshimasa Tsuruoka & Relatedness level Richard Socher Relatedness Relatedness encoder encoder syntactic DEP DEP level • Final Model à word level CHUNK CHUNK POS POS word representation word representation Sentence 1 Sentence 2

Tackling the Limits of Deep Learning for NLP Richard Socher - PowerPoint PPT Presentation

Tackling the Limits of Deep Learning for NLP Richard Socher Salesforce Research Caiming Xiong, Stephen Merity, James Bradbury, Victor Zhong, Kazuma Hashimoto and Stanford: Hakan Inan, Khashayar Khosravi The Limits of Single Task Learning

City Limits Lions Clubs City Limits Lions Clubs City Limits Lions Clubs City Limits Lions

Different Types of Limits Besides ordinary, two-sided limits, there are one-sided limits (left-

MAT 166 Calculus for Bus/Soc Chapter 3 Notes Limits The Deriviative David J. Gisch Limits

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Deep learning for NLP: Introduction CS 6956: Deep Learning for NLP Words are a very fantastical

Natural Language Processing with Deep Learning CS224N The Future of Deep Learning + NLP Kevin

Limits (the size of the pie) allocation limits minimum reliability flow of supply Limits

Medical Programs Overview Table 1. Caption Medical SNAP TANF Programs Income Limits Income

Scope & Limits of Scope & Limits of Scope & Limits of Legal Authority Legal

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

Crystalizing Sophisticated Code Analyses

What do Preschool Quality and Costs Tell Us About Having

during ENTSOG SJWS 3 Brussels 4 th May 2011 ACER Draft Framework Guideline on Capacity

Variational principles for discrete maps Martin Tassy, joint work with Georg Menz October 12,

Course Review Software Process Agile Spiral Waterfall / V Cycle SCRUM Design Requirements

Riga Summit 2015 MultilingualWeb Building a Multilingual Website with No Translation Resources

Augmented Reality for Port Placement and Navigation in Robotically Assisted Minimally Invasive

Merging the Markets: Combining New Yorks Individual and Small Group Markets into Common Risk

Tackling the Limits of Deep Learning for NLP Richard Socher - PowerPoint PPT Presentation

Tackling the Limits of Deep Learning for NLP Richard Socher Salesforce Research Caiming Xiong, Stephen Merity, James Bradbury, Victor Zhong, Kazuma Hashimoto and Stanford: Hakan Inan, Khashayar Khosravi The Limits of Single Task Learning

City Limits Lions Clubs City Limits Lions Clubs City Limits Lions Clubs City Limits Lions

Different Types of Limits Besides ordinary, two-sided limits, there are one-sided limits (left-

MAT 166 Calculus for Bus/Soc Chapter 3 Notes Limits The Deriviative David J. Gisch Limits

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Deep learning for NLP: Introduction CS 6956: Deep Learning for NLP Words are a very fantastical

Natural Language Processing with Deep Learning CS224N The Future of Deep Learning + NLP Kevin

Limits (the size of the pie) allocation limits minimum reliability flow of supply Limits

Medical Programs Overview Table 1. Caption Medical SNAP TANF Programs Income Limits Income

Scope &amp; Limits of Scope &amp; Limits of Scope &amp; Limits of Legal Authority Legal

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

Crystalizing Sophisticated Code Analyses

What do Preschool Quality and Costs Tell Us About Having

during ENTSOG SJWS 3 Brussels 4 th May 2011 ACER Draft Framework Guideline on Capacity

Variational principles for discrete maps Martin Tassy, joint work with Georg Menz October 12,

Course Review Software Process Agile Spiral Waterfall / V Cycle SCRUM Design Requirements

Riga Summit 2015 MultilingualWeb Building a Multilingual Website with No Translation Resources

Augmented Reality for Port Placement and Navigation in Robotically Assisted Minimally Invasive

Merging the Markets: Combining New Yorks Individual and Small Group Markets into Common Risk

Scope & Limits of Scope & Limits of Scope & Limits of Legal Authority Legal